Skip to content

feat: expand gh aw audit with engine config, prompt, session, safe-output, and MCP server health sections#22932

Merged
Mossaka merged 8 commits intomainfrom
copilot/feat-expand-gh-aw-audit-metrics
Mar 25, 2026
Merged

feat: expand gh aw audit with engine config, prompt, session, safe-output, and MCP server health sections#22932
Mossaka merged 8 commits intomainfrom
copilot/feat-expand-gh-aw-audit-metrics

Conversation

Copy link
Contributor

Copilot AI commented Mar 25, 2026

gh aw audit currently leaves most downloaded artifact data unanalyzed. This adds five new audit sections to provide a single-pane-of-glass view across the key observability domains (Phase 1 of the audit expansion).

New audit sections

  • Engine Configuration — Full aw_info.json parse: engine ID/name, model, version, firewall version, trigger event, configured MCP servers
  • Prompt Analysis — Prompt size from prompt.txt (checks both root and aw-prompts/ paths)
  • Session & Agent Performance — Wall time, turn count, avg turn duration, tokens/min, noop count, timeout detection (from run + job conclusions)
  • Safe Output Summary — Classifies created items by type with human-readable summaries (e.g. 2 PR(s), 1 comment(s))
  • MCP Server Health — Per-server error rates, status indicators (✅/⚠️/❌), cross-references MCPFailures, surfaces top-5 slowest tool calls

Artifact download resilience

  • Critical artifact retry — When gh run download fails partway through due to non-zip artifacts (e.g. .dockerbuild files), the download now retries missing critical artifacts (activation, agent, firewall-audit-logs) individually using --name. This ensures flattening and audit extraction have the data they need.

Structure

  • pkg/cli/audit_expanded.go — New types (EngineConfig, PromptAnalysis, SessionAnalysis, SafeOutputSummary, MCPServerHealth) and extraction functions
  • pkg/cli/audit_report.go — Five new fields on AuditData, wired into buildAuditData
  • pkg/cli/audit_report_render.go — Console rendering for all new sections
  • pkg/cli/audit_expanded_test.go — 20+ test cases covering extraction, analysis, and integration
  • pkg/cli/logs_download.goretryCriticalArtifacts() function for individual artifact retry on partial download failure

JSON output example (new fields)

{
  "engine_config": {
    "engine_id": "copilot",
    "model": "gpt-4",
    "mcp_servers": ["github", "filesystem"]
  },
  "prompt_analysis": { "prompt_size": 2450 },
  "session_analysis": {
    "wall_time": "4m32s",
    "turn_count": 8,
    "avg_turn_duration": "34s",
    "tokens_per_minute": 2450.0,
    "timeout_detected": false
  },
  "safe_output_summary": {
    "total_items": 3,
    "summary": "2 PR(s), 1 comment(s)",
    "items_by_type": { "create_pull_request": 2, "add_comment": 1 }
  },
  "mcp_server_health": {
    "summary": "2 server(s), 2 healthy, 0 degraded, 0 failed",
    "total_requests": 57,
    "error_rate": 3.5
  }
}

⌨️ Start Copilot coding agent tasks without leaving your editor — available in VS Code, Visual Studio, JetBrains IDEs and Eclipse.


Changeset

  • Type: patch
  • Description: Expanded gh aw audit with engine configuration, prompt analysis, session performance metrics, safe-output summaries, and MCP server health reporting.

Generated by Changeset Generator for issue #22932 ·



✨ PR Review Safe Output Test - Run 23557793263

Note

🔒 Integrity filter blocked 2 items

The following items were blocked because they don't meet the GitHub integrity level.

To allow these resources, lower min-integrity in your GitHub frontmatter:

tools:
  github:
    min-integrity: approved  # merged | approved | unapproved | none

💥 [THE END] — Illustrated by Smoke Claude ·

Copilot AI and others added 2 commits March 25, 2026 18:14
…tput, and MCP server health sections

Co-authored-by: Mossaka <5447827+Mossaka@users.noreply.github.com>
Agent-Logs-Url: https://github.com/github/gh-aw/sessions/5d0a4d9a-ec43-4c04-b048-af2b228727b4
…endering consistency

Co-authored-by: Mossaka <5447827+Mossaka@users.noreply.github.com>
Agent-Logs-Url: https://github.com/github/gh-aw/sessions/5d0a4d9a-ec43-4c04-b048-af2b228727b4
Copilot AI changed the title [WIP] Expand gh aw audit with MCP, session, token, safe-output, and agent performance metrics feat: expand gh aw audit with engine config, prompt, session, safe-output, and MCP server health sections Mar 25, 2026
Copilot AI requested a review from Mossaka March 25, 2026 18:16
@Mossaka Mossaka marked this pull request as ready for review March 25, 2026 18:25
Copilot AI review requested due to automatic review settings March 25, 2026 18:25
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Expands gh aw audit to analyze more of the downloaded artifact/log data and present it as additional audit sections in both JSON output and console rendering.

Changes:

  • Added expanded audit data models + extraction/analysis helpers (engine config, prompt, session, safe output summary, MCP health).
  • Wired new sections into AuditData generation and console rendering.
  • Added a dedicated test suite covering the new extraction/analysis behavior.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File Description
pkg/cli/audit_expanded.go Introduces new expanded audit types and builders/extractors for engine config, prompt analysis, session analysis, safe output summary, and MCP server health.
pkg/cli/audit_report.go Extends AuditData with new fields and populates them in buildAuditData.
pkg/cli/audit_report_render.go Renders the five new sections in the console report.
pkg/cli/audit_expanded_test.go Adds unit/integration-style tests for the new expanded audit extraction and aggregation logic.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

summary := &SafeOutputSummary{
TotalItems: len(items),
ItemsByType: make(map[string]int),
Items: items,
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

buildSafeOutputSummary always sets Items: items, so safe_output_summary.items will always be emitted in JSON (since the slice is non-empty) and duplicates the existing created_items field in AuditData. This increases output size and may be redundant; consider omitting the per-item list from SafeOutputSummary (or only populating it behind an explicit flag) and keeping this section focused on the aggregated counts/summary.

Suggested change
Items: items,

Copilot uses AI. Check for mistakes.
Count: count,
})
}
sort.Slice(summary.TypeDetails, func(i, j int) bool {
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

summary.TypeDetails is built by ranging over a map and then sorted only by Count. When multiple types have the same count, the order becomes nondeterministic (map iteration order), which can make the rendered summary/table ordering unstable across runs. Add a deterministic tie-breaker in the sort (e.g., Count desc then Type asc) or build TypeDetails from sorted keys.

Suggested change
sort.Slice(summary.TypeDetails, func(i, j int) bool {
sort.Slice(summary.TypeDetails, func(i, j int) bool {
if summary.TypeDetails[i].Count == summary.TypeDetails[j].Count {
return summary.TypeDetails[i].Type < summary.TypeDetails[j].Type
}

Copilot uses AI. Check for mistakes.
Comment on lines +348 to +363
health.TotalServers = len(health.Servers)
health.HealthySvrs = health.TotalServers - health.FailedSvrs

// Calculate overall error rate
if health.TotalRequests > 0 {
health.ErrorRate = float64(health.TotalErrors) / float64(health.TotalRequests) * 100
}

// Sort servers by request count (highest first)
sort.Slice(health.Servers, func(i, j int) bool {
return health.Servers[i].RequestCount > health.Servers[j].RequestCount
})

// Build summary string
health.Summary = fmt.Sprintf("%d server(s), %d healthy, %d failed",
health.TotalServers, health.HealthySvrs, health.FailedSvrs)
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HealthySvrs is computed as TotalServers - FailedSvrs, but servers marked as "⚠️ degraded" (error rate > 10%) are still counted as healthy. This makes health.Summary inconsistent with the per-server status. Consider tracking a degraded count (or computing healthy as only the servers with "healthy" status) and reflecting that in the summary fields.

Copilot uses AI. Check for mistakes.
Comment on lines +147 to +152
analysis := &PromptAnalysis{
PromptSize: len(data),
PromptFile: promptPath,
}

auditExpandedLog.Printf("Extracted prompt analysis: size=%d chars from %s", analysis.PromptSize, promptPath)
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extractPromptAnalysis stores PromptFile as the full path under logsPath. This makes JSON/console output machine-specific and can leak local filesystem paths. Consider storing a stable relative path (e.g., prompt.txt vs aw-prompts/prompt.txt) or converting to a relative path (there is a helper console.ToRelativePath in pkg/console/render.go).

Suggested change
analysis := &PromptAnalysis{
PromptSize: len(data),
PromptFile: promptPath,
}
auditExpandedLog.Printf("Extracted prompt analysis: size=%d chars from %s", analysis.PromptSize, promptPath)
relPromptPath, err := filepath.Rel(logsPath, promptPath)
if err != nil {
relPromptPath = filepath.Base(promptPath)
}
analysis := &PromptAnalysis{
PromptSize: len(data),
PromptFile: relPromptPath,
}
auditExpandedLog.Printf("Extracted prompt analysis: size=%d chars from %s", analysis.PromptSize, relPromptPath)

Copilot uses AI. Check for mistakes.
@Mossaka Mossaka added the smoke label Mar 25, 2026
@github-actions
Copy link
Contributor

📰 BREAKING: Smoke Copilot is now investigating this pull request. Sources say the story is developing...

@github-actions github-actions bot removed the smoke label Mar 25, 2026
@github-actions
Copy link
Contributor

github-actions bot commented Mar 25, 2026

🌑 The shadows whisper... Smoke Codex was cancelled. The oracle requires further meditation...

@github-actions
Copy link
Contributor

github-actions bot commented Mar 25, 2026

❌ Tool validation failed! Agent Container Smoke Test detected missing tools: was cancelled

@github-actions
Copy link
Contributor

github-actions bot commented Mar 25, 2026

💫 TO BE CONTINUED... Smoke Claude was cancelled! Our hero faces unexpected challenges...

@Mossaka Mossaka added the smoke label Mar 25, 2026
@github-actions
Copy link
Contributor

github-actions bot commented Mar 25, 2026

🎬 THE ENDSmoke Claude MISSION: ACCOMPLISHED! The hero saves the day! ✨

@github-actions
Copy link
Contributor

github-actions bot commented Mar 25, 2026

✨ The prophecy is fulfilled... Smoke Codex has completed its mystical journey. The stars align. 🌟

@github-actions
Copy link
Contributor

github-actions bot commented Mar 25, 2026

✅ All tools validated successfully! Agent Container Smoke Test confirms agent container is ready.

@github-actions
Copy link
Contributor

Smoke Test: Copilot - 23557713322 ✅ PASS

Test Status
GitHub MCP
GH CLI (mcpscripts)
Playwright
Web Fetch
File Writing + Bash
Discussion Interaction
Build gh-aw
Discussion Creation
Workflow Dispatch
PR Review

PR Author: @Mossaka | Assignees: @pelikhan, @Copilot

Note

🔒 Integrity filter blocked 2 items

The following items were blocked because they don't meet the GitHub integrity level.

To allow these resources, lower min-integrity in your GitHub frontmatter:

tools:
  github:
    min-integrity: approved  # merged | approved | unapproved | none

📰 BREAKING: Report filed by Smoke Copilot ·

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Smoke test review of PR #22932. Two minor suggestions on audit_expanded.go: map ordering for ItemsByType and range validation for AgentActiveRatio. No blockers.

Note

🔒 Integrity filter blocked 2 items

The following items were blocked because they don't meet the GitHub integrity level.

  • #22924 search_pull_requests: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".
  • #22932 pull_request_read: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".

To allow these resources, lower min-integrity in your GitHub frontmatter:

tools:
  github:
    min-integrity: approved  # merged | approved | unapproved | none

📰 BREAKING: Report filed by Smoke Copilot

TypeDetails []SafeOutputTypeDetail `json:"type_details,omitempty"`
}

// SafeOutputTypeDetail contains counts for a specific safe output type
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using map[string]int here — but since this is used for JSON serialization and console rendering, it might be worth using a typed slice of SafeOutputTypeDetail directly to maintain deterministic ordering in output.

// SafeOutputSummary provides a summary of safe output items by type
type SafeOutputSummary struct {
TotalItems int `json:"total_items" console:"header:Total Items"`
ItemsByType map[string]int `json:"items_by_type"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The AgentActiveRatio field (line 51) has a comment // 0.0 - 1.0 inline. This range constraint would be better documented as a struct tag or validated at parse time to prevent silent out-of-range values from being stored.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💥 Automated smoke test review - all systems nominal!

Note

🔒 Integrity filter blocked 2 items

The following items were blocked because they don't meet the GitHub integrity level.

  • #22915 search_pull_requests: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".
  • #22932 pull_request_read: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".

To allow these resources, lower min-integrity in your GitHub frontmatter:

tools:
  github:
    min-integrity: approved  # merged | approved | unapproved | none

💥 [THE END] — Illustrated by Smoke Claude

var auditExpandedLog = logger.New("cli:audit_expanded")

// EngineConfig represents the engine configuration extracted from aw_info.json
type EngineConfig struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The EngineConfig struct is well-structured. Consider adding a CreatedAt timestamp field to track when the config was captured for better audit trail chronology.

}

// SessionAnalysis represents session and agent performance metrics
type SessionAnalysis struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SessionAnalysis looks comprehensive. The AgentActiveRatio field (0.0–1.0) could benefit from a doc comment explaining the calculation method (e.g., active tool time / total wall time).

@github-actions
Copy link
Contributor

🤖 Smoke test §23557793385 results for @Mossaka and @Copilot:\n\n| Test | Status |\n|------|--------|\n| GitHub MCP | ✅ |\n| MCP Scripts GH CLI | ✅ |\n| Serena MCP | ❌ |\n| Playwright | ✅ |\n| Web Fetch | ✅ |\n| File Writing | ✅ |\n| Bash Tool | ✅ |\n| Discussion Interaction | ✅ |\n| Build gh-aw | ✅ |\n| Discussion Creation | ✅ |\n| Workflow Dispatch | ✅ |\n| PR Review | ✅ |\n\nOverall: ⚠️ PARTIAL PASS (Serena MCP tool not available)

Note

🔒 Integrity filter blocked 1 item

The following item were blocked because they don't meet the GitHub integrity level.

To allow these resources, lower min-integrity in your GitHub frontmatter:

tools:
  github:
    min-integrity: approved  # merged | approved | unapproved | none

📰 BREAKING: Report filed by Smoke Copilot ·

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Smoke test review pass 🤖 — Two minor observations on audit_expanded.go: (1) AgentActiveRatio could use a fuller doc comment explaining what it measures, and (2) the prompt path search might benefit from including prompt.md. Overall the new expanded audit sections (engine config, prompt analysis, session metrics, safe-output summary, MCP health) look well-structured and follow existing codebase patterns.

Note

🔒 Integrity filter blocked 1 item

The following item were blocked because they don't meet the GitHub integrity level.

  • #22932 pull_request_read: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".

To allow these resources, lower min-integrity in your GitHub frontmatter:

tools:
  github:
    min-integrity: approved  # merged | approved | unapproved | none

📰 BREAKING: Report filed by Smoke Copilot

TokensPerMinute float64 `json:"tokens_per_minute,omitempty"`
TimeoutDetected bool `json:"timeout_detected"`
NoopCount int `json:"noop_count,omitempty" console:"header:Noop Count,omitempty"`
AgentActiveRatio float64 `json:"agent_active_ratio,omitempty"` // 0.0 - 1.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The AgentActiveRatio comment // 0.0 - 1.0 is helpful but could be more explicit. Consider adding a doc comment explaining what this ratio measures (e.g., time agent was actively processing vs. total wall time), since it's not immediately obvious from the field name alone.

promptPaths := []string{
filepath.Join(logsPath, "prompt.txt"),
filepath.Join(logsPath, "aw-prompts", "prompt.txt"),
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice use of filepath.Join and trying multiple paths for prompt.txt. One minor suggestion: consider also checking for prompt.md since some workflows may use markdown prompts rather than plain text files.

@github-actions
Copy link
Contributor

📰 VERDICT: Smoke Copilot has concluded. All systems operational. This is a developing story. 🎤

@github-actions
Copy link
Contributor

Smoke test summary (run 23557793348):

Note

🔒 Integrity filter blocked 1 item

The following item were blocked because they don't meet the GitHub integrity level.

  • #22915 search_pull_requests: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".

To allow these resources, lower min-integrity in your GitHub frontmatter:

tools:
  github:
    min-integrity: approved  # merged | approved | unapproved | none

🔮 The oracle has spoken through Smoke Codex ·

Copy link
Collaborator

@Mossaka Mossaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the changes and verified smoke tests.

Code review:

  • Clean, well-structured addition of 5 new audit sections: Engine Config, Prompt Analysis, Session & Agent Performance, Safe Output Summary, MCP Server Health
  • New pkg/cli/audit_expanded.go (463 lines) with extraction functions and types
  • 625 lines of tests in audit_expanded_test.go
  • Minimal changes to existing files: 17 lines in audit_report.go (wiring) and 184 lines in audit_report_render.go (rendering)
  • No compiler changes → no lock files or golden files affected
  • All imports already present (strconv, etc.)

CI: All 50 checks pass (5 expected skips).

Smoke tests verified (not just green pipeline):

  • Smoke Claude: MCP servers connected, extensive tools/call usage (github, mcpscripts, playwright, safeoutputs), safe outputs uploaded, issue #22952 created
  • Smoke Copilot: MCP tools used (PR review comments, discussions, issues), safe outputs uploaded, issue #22955 created
  • Smoke Codex: passed
  • Firewall block warnings in PR body are from unit tests running inside the container, not actual agent blocks

@Mossaka
Copy link
Collaborator

Mossaka commented Mar 25, 2026

@copilot There is a bug: the Engine Config, Prompt Analysis, and MCP Server Health sections always return null because the extraction functions look for files in the wrong paths.

I ran gh aw audit 23557793385 --repo github/gh-aw and all three sections are missing despite the data being available in the downloaded artifacts.

Root cause

gh run download extracts artifacts into subdirectories named after each artifact:

run-23557793385/
  activation/aw_info.json           ← aw_info.json is HERE
  activation/aw-prompts/prompt.txt  ← prompt.txt is HERE
  agent/aw-prompts/prompt.txt       ← also HERE
  agent/mcp-logs/tools.json         ← MCP data is HERE
  agent/mcp-logs/rpc-messages.jsonl
  firewall-audit-logs/audit/policy-manifest.json
  detection/detection.log           ← flattened to detection.log
  safe-output-items/...             ← flattened to safe-output-items.jsonl

But the extraction functions look at logsPath (the run output dir root):

  • extractEngineConfig() looks for filepath.Join(logsPath, "aw_info.json") — should also check filepath.Join(logsPath, "activation", "aw_info.json")
  • extractPromptAnalysis() looks for filepath.Join(logsPath, "prompt.txt") and filepath.Join(logsPath, "aw-prompts", "prompt.txt") — should also check under activation/ and agent/ subdirs
  • buildMCPServerHealth() needs MCP gateway data from agent/mcp-logs/ but the existing mcpToolUsage pipeline may not parse those files

Fix

Update the file search paths in each extraction function to also look inside the artifact subdirectories:

// extractEngineConfig - add fallback paths
awInfoPaths := []string{
    filepath.Join(logsPath, "aw_info.json"),
    filepath.Join(logsPath, "activation", "aw_info.json"),
}

// extractPromptAnalysis - add fallback paths  
promptPaths := []string{
    filepath.Join(logsPath, "prompt.txt"),
    filepath.Join(logsPath, "aw-prompts", "prompt.txt"),
    filepath.Join(logsPath, "activation", "aw-prompts", "prompt.txt"),
    filepath.Join(logsPath, "agent", "aw-prompts", "prompt.txt"),
}

Please fix this and push.

…t.txt; address review feedback

Agent-Logs-Url: https://github.com/github/gh-aw/sessions/e7fde9e9-a102-4f1e-aaf8-003562fdae38

Co-authored-by: Mossaka <5447827+Mossaka@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Mar 25, 2026

@copilot There is a bug: the Engine Config, Prompt Analysis, and MCP Server Health sections always return null because the extraction functions look for files in the wrong paths.

I ran `gh aw audit...

Fixed in 71d01f3. The three extraction functions now search artifact subdirectories:

  • extractEngineConfig() — new findAwInfoPath() helper checks both aw_info.json and activation/aw_info.json
  • extractPromptAnalysis() — now checks prompt.txt, aw-prompts/prompt.txt, activation/aw-prompts/prompt.txt, and agent/aw-prompts/prompt.txt
  • awInfoHasMCPServers() — reuses findAwInfoPath() for the same fallback logic

Also addressed the PR reviewer comments in the same commit:

  • Removed duplicate Items field from SafeOutputSummary (was duplicating created_items)
  • Added deterministic tie-breaker in TypeDetails sort (count desc, then type asc)
  • Added DegradedSvrs field to MCPServerHealth so degraded servers aren't counted as healthy
  • PromptFile now stores a relative path instead of absolute

Copilot AI requested a review from Mossaka March 25, 2026 19:22
@Mossaka
Copy link
Collaborator

Mossaka commented Mar 25, 2026

@copilot The path fix for extractEngineConfig() and extractPromptAnalysis() looks correct, but there is a deeper pre-existing bug that prevents the data from being available at all.

Problem

gh run download (the GitHub CLI) stops downloading after encountering the first invalid artifact (a .dockerbuild file that is not a valid zip). This means only artifacts downloaded before the error (detection, safe-output-items) are available. The activation, agent, and firewall-audit-logs artifacts are never downloaded.

Evidence:

$ gh run download 23557793385 --repo github/gh-aw --dir /tmp/raw-download
error downloading github~gh-aw~NZ7ZX2.dockerbuild: error extracting zip archive: zip: not a valid zip file

$ ls /tmp/raw-download/
detection  safe-output-items    # Only 2 of 6 artifacts!

But downloading individually works fine:

$ gh run download 23557793385 --repo github/gh-aw --name activation --dir /tmp/raw-activation
$ ls /tmp/raw-activation/
aw-prompts  aw_info.json    # ✅ Data is there

Fix needed in downloadRunArtifacts (logs_download.go)

The existing error handling (line 576) catches the non-zip error but gh run download has already aborted. The fix should retry with --name for each missing critical artifact when the initial bulk download fails partially. After the isNonZipArtifactError branch, add individual downloads for critical artifacts that are missing:

// After the isNonZipArtifactError branch, try downloading missing critical artifacts individually
criticalArtifacts := []string{"activation", "agent", "firewall-audit-logs"}
for _, name := range criticalArtifacts {
    dir := filepath.Join(outputDir, name)
    if fileutil.DirExists(dir) {
        continue // Already downloaded
    }
    retryArgs := []string{"run", "download", strconv.FormatInt(runID, 10), "--name", name, "--dir", outputDir}
    // ... add repo args ...
    retryCmd := workflow.ExecGH(retryArgs...)
    if retryOutput, retryErr := retryCmd.CombinedOutput(); retryErr != nil {
        logsDownloadLog.Printf("Failed to download %s artifact individually: %v (%s)", name, retryErr, string(retryOutput))
    }
}

This way the flattening functions (flattenActivationArtifact, flattenUnifiedArtifact) will actually have data to work with, and the new expanded audit sections will populate correctly.

Please implement this fix in logs_download.go.

…fails due to non-zip files

Agent-Logs-Url: https://github.com/github/gh-aw/sessions/f7c8a249-1077-4125-93a6-5aa5f4b84065

Co-authored-by: Mossaka <5447827+Mossaka@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Mar 25, 2026

@copilot The path fix for extractEngineConfig() and extractPromptAnalysis() looks correct, but there is a deeper pre-existing bug that prevents the data from being available at all.

Problem...

Fixed in 5d16a60. Added retryCriticalArtifacts() in logs_download.go that runs after the isNonZipArtifactError branch. When the bulk gh run download fails due to non-zip artifacts, it retries each missing critical artifact (activation, agent, firewall-audit-logs) individually using --name. This ensures the flattening functions have data to work with and the expanded audit sections populate correctly.

The retry:

  • Skips artifacts already present in the output directory
  • Builds the same -R repo flag as the initial download
  • Logs failures per-artifact without failing the overall download
  • Runs before the empty-dir check so partial downloads still proceed

Copy link
Collaborator

@Mossaka Mossaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verified locally — all new audit sections now populate correctly after the download retry fix.

Local test result (gh aw audit 23557793385 --repo github/gh-aw):

  • Engine Configuration: Engine ID: copilot, Model: auto, Firewall Version: v0.25.0, Repository: github/gh-aw
  • Prompt Analysis: Prompt Size: 14.4k chars
  • Session & Agent Performance: Wall Time: 11.3m, Timeout Detected: No
  • Safe Output Summary: 12 items (2 PR review comments, 2 comments, etc.)
  • MCP Server Health: Not shown (no gateway metrics in this run's agent artifact — expected)

Fixes across 3 commits:

  1. New audit sections with types and extraction functions
  2. Path fix: search activation/ and agent/ subdirs for aw_info.json and prompt.txt
  3. Download retry: when gh run download aborts on non-zip artifacts, individually retry critical artifacts (activation, agent, firewall-audit-logs)

CI: All 50 checks pass. Smoke tests (Claude, Copilot, Codex) all genuinely passed earlier.

@Mossaka Mossaka merged commit a7078dd into main Mar 25, 2026
49 checks passed
@Mossaka Mossaka deleted the copilot/feat-expand-gh-aw-audit-metrics branch March 25, 2026 19:58
github-actions bot added a commit that referenced this pull request Mar 25, 2026
…idation

- cli.md: expand gh aw audit description with the new report sections
  added in #22932 (engine config, prompt analysis, session & agent
  performance, MCP server health, safe output summary)
- github-tools.md: document toolsets shorthand values (default, all) and
  add a note that toolsets: [all] excludes dependabot (#22851)
- triggers.md: document compile-time glob pattern validation for push/
  pull_request/pull_request_target/workflow_run filter fields (#22873)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
pelikhan pushed a commit that referenced this pull request Mar 25, 2026
…idation (#22972)

- cli.md: expand gh aw audit description with the new report sections
  added in #22932 (engine config, prompt analysis, session & agent
  performance, MCP server health, safe output summary)
- github-tools.md: document toolsets shorthand values (default, all) and
  add a note that toolsets: [all] excludes dependabot (#22851)
- triggers.md: document compile-time glob pattern validation for push/
  pull_request/pull_request_target/workflow_run filter fields (#22873)

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: expand gh aw audit with MCP, session, token, safe-output, and agent performance metrics

3 participants