Executive Summary
Analysis of the most recent 30 gh-aw fleet runs (24 completed + 6 in-progress, 3.1h wall-time, 33.4M tokens) surfaced one critical safe-output contract violation that took down the daily Contribution Check workflow despite the agent itself running cleanly. Yesterday this same workflow finished success with 31 turns; today it ran 0 turns of useful work as visible to the validator and ended conclusion: failure. The blast radius is small (one workflow), but the workflow is daily-scheduled and the fix is a one-paragraph prompt clarification — easily the highest-impact change in the window.
AgentRx Evidence
- Critical step:
agent.emit(add_comment) in run §26494267634 (Contribution Check, agent index 1 of the safe-output stream)
- Failure category: tool-invocation contract violation — missing required parameter
- Frequency / impact: 1/1 scheduled occurrences in the 7-day window (100%); 840,632 tokens spent on an agent run whose downstream
safe_outputs job rejected the first item and short-circuited the rest (the planned create_issue for the daily summary plus add_labels were never emitted to GitHub)
- Representative run IDs: 26494267634 (today, failed), 26451955941 (yesterday baseline, success)
The validator error itself is unambiguous (extracted from workflow-logs/safe_outputs/8_Process Safe Outputs.txt):
Message 1 (add_comment) failed: Target is "*" but no item_number/issue_number/pull_request_number/pr_number/pr/pull_number specified in add_comment item
The agent emitted {"body": "...", "type": "add_comment", "temporary_id": "aw_nCeJ1YQO"} — no issue_number, even though the comment-dispatcher subagent at .github/workflows/contribution-check.md:265 is documented to return {"issue_number": <number>, "body": "<comment>"}. The issue_number was produced upstream but dropped when the parent agent translated dispatcher output into the safe-output emission.
Labeled violations (from failure-pattern-classifier)
| ID |
Severity |
Workflow |
Step |
Pattern |
| v1 |
critical |
Contribution Check |
agent.emit(add_comment) |
tool-invocation contract violation: missing required target field |
| v2 |
high |
PR Code Quality Reviewer |
agent.behavior.turn_count |
execution drift: 3→39 turn spread across 2 runs (13×, avg 21) |
| v3 |
medium |
GitHub Remote MCP Authentication Test |
container.network.egress |
firewall friction: 33% block rate (4/12) |
| v4 |
medium |
Step Name Alignment |
session.cache |
cache_memory_miss reported |
| v5 |
info |
(fleet-wide) |
fleet.event_templates |
7 high-anomaly events (score > 0.6) across 30 runs |
AgentRx Artifacts
Note: AgentRx's LLM-driven stages (ir / static / dynamic / check / judge / report) require the copilot / azure / trapi endpoint, none of which are reachable inside the sandboxed agent firewall (only api.anthropic.com, api.githubcopilot.com, o205451.ingest.us.sentry.io, etc. are on the allow-list — see firewall_log.allowed_domains). The IR stage failed at agentrx/llm_clients/copilot_cli.py:171 with 'copilot' CLI not found on PATH. Per the workflow's documented guardrail ("If a later stage fails due to endpoint/auth constraints, continue with completed artifacts and still produce a grounded recommendation"), the findings below are sourced from the MCP audit data instead — specifically the per-workflow observability_insights and the audit output for run 26494267634.
| Artifact |
Path |
Source |
Notes |
trajectory.json |
/tmp/gh-aw/agent/agentrx/trajectory.json |
trajectory-builder |
67 turns derived from 30 runs + 7 insights |
state.json |
runs/gh-aw-daily/state.json |
AgentRx |
completed_stages: [], domain auto-detected as flash |
check.json |
runs/gh-aw-daily/check.json |
MCP-derived fallback |
5 violations (1 critical, 1 high, 2 medium, 1 info) |
judge.json |
runs/gh-aw-daily/judge.json |
MCP-derived fallback |
critical step + root-cause category + candidate fix |
| IR / static / dynamic / report |
— |
LLM-blocked |
'copilot' CLI not found on PATH (endpoint unreachable in firewall) |
Top fleet-level signals from the MCP observability_insights list:
- reliability (high): "Workflow contribution-check accounted for 1 failure(s) across 1 run(s), a 100% failure rate."
- drift (medium): "Workflow pr-code-quality-reviewer varied from 3 to 39 turns across runs ... avg 21.0 turns."
- network (medium): "github-remote-mcp-authentication-test had the highest firewall block pressure with 4 blocked request(s) out of 12 total (33%)."
- tooling (medium): "step-name-alignment ... 1 missing data signal(s)" (cache_memory_miss).
- reliability (high): "7 high-anomaly events across 30 runs ... score > 0.6, indicating unusual patterns relative to the learned templates."
Recommended Optimization
Tighten the contribution-check.md prompt so the parent agent passes issue_number through into the add_comment emit.
Concretely, edit .github/workflows/contribution-check.md around line 181 (Step 2, "Post per-PR comments"), and re-compile the lock file. Suggested wording:
Use the comment-dispatcher agent on the verdict array (the JSON objects returned by the contribution-checker subagent in Step 1) to get the list of comments to post. For each returned {issue_number, body} payload, emit one add_comment safe output that includes BOTH fields verbatim — issue_number is REQUIRED. The safe-output validator rejects add_comment items that lack item_number / issue_number / pull_request_number (see error: Target is "*" but no item_number/...) and will fail the entire safe_outputs job, dropping the daily summary issue and its labels. Do not specify the repo — target-repo is pre-configured.
Why this is the highest-impact fix in the window:
- Daily-scheduled workflow → fixes recur every 24h until addressed — every day spent unfixed is another
conclusion: failure and ~840k wasted tokens.
- Smallest meaningful change — one paragraph in one prompt, no code changes, no schema changes. The validator already enforces the rule correctly; only the prompt was permissive.
- Cascading recovery — fixing the first
add_comment rejection lets the planned daily-summary create_issue and its add_labels go through (currently both silently dropped after the failure).
- Other candidates have higher coordination cost: v2 (drift) requires multi-run study, v3/v4 require network-config debate, v5 is fleet-wide and not actionable yet.
Validation Plan
- Pre-merge: re-compile with
agentic-workflows compile and confirm contribution-check.lock.yml updates.
- Next scheduled run (within 24h after merge): expect
conclusion: success on the next Contribution Check run, with the daily summary issue created and lgtm / needs-work labels applied. Compare against today's failed run 26494267634.
- Success metric:
safe_outputs job conclusion flips from failure → success; agent_output.json shows every add_comment item carrying an integer issue_number; observability hotspot workflow=contribution-check failures=1 runs=1 disappears from the next AgentRx report.
- Regression watch: keep the violation rule
safe_output_add_comment_requires_target in check.json's static set so AgentRx flags any recurrence in future runs.
References
- §26494267634 — failing Contribution Check run (today)
- §26451955941 — baseline successful Contribution Check run (yesterday)
- §26492354589 — PR Code Quality Reviewer (39-turn outlier; secondary v2 evidence)
Generated by ⚡ Daily AgentRx Trace Optimizer · opus47 15.8M · ◷
Executive Summary
Analysis of the most recent 30 gh-aw fleet runs (24 completed + 6 in-progress, 3.1h wall-time, 33.4M tokens) surfaced one critical safe-output contract violation that took down the daily Contribution Check workflow despite the agent itself running cleanly. Yesterday this same workflow finished
successwith 31 turns; today it ran 0 turns of useful work as visible to the validator and endedconclusion: failure. The blast radius is small (one workflow), but the workflow is daily-scheduled and the fix is a one-paragraph prompt clarification — easily the highest-impact change in the window.AgentRx Evidence
agent.emit(add_comment)in run §26494267634 (Contribution Check, agent index 1 of the safe-output stream)safe_outputsjob rejected the first item and short-circuited the rest (the plannedcreate_issuefor the daily summary plusadd_labelswere never emitted to GitHub)The validator error itself is unambiguous (extracted from
workflow-logs/safe_outputs/8_Process Safe Outputs.txt):The agent emitted
{"body": "...", "type": "add_comment", "temporary_id": "aw_nCeJ1YQO"}— noissue_number, even though thecomment-dispatchersubagent at.github/workflows/contribution-check.md:265is documented to return{"issue_number": <number>, "body": "<comment>"}. Theissue_numberwas produced upstream but dropped when the parent agent translated dispatcher output into the safe-output emission.Labeled violations (from failure-pattern-classifier)
agent.emit(add_comment)agent.behavior.turn_countcontainer.network.egresssession.cachefleet.event_templatesAgentRx Artifacts
Note: AgentRx's LLM-driven stages (ir / static / dynamic / check / judge / report) require the
copilot/azure/trapiendpoint, none of which are reachable inside the sandboxed agent firewall (onlyapi.anthropic.com,api.githubcopilot.com,o205451.ingest.us.sentry.io, etc. are on the allow-list — seefirewall_log.allowed_domains). The IR stage failed atagentrx/llm_clients/copilot_cli.py:171with'copilot' CLI not found on PATH. Per the workflow's documented guardrail ("If a later stage fails due to endpoint/auth constraints, continue with completed artifacts and still produce a grounded recommendation"), the findings below are sourced from the MCP audit data instead — specifically the per-workflowobservability_insightsand theauditoutput for run 26494267634.trajectory.json/tmp/gh-aw/agent/agentrx/trajectory.jsonstate.jsonruns/gh-aw-daily/state.jsoncompleted_stages: [], domain auto-detected asflashcheck.jsonruns/gh-aw-daily/check.jsonjudge.jsonruns/gh-aw-daily/judge.json'copilot' CLI not found on PATH(endpoint unreachable in firewall)Top fleet-level signals from the MCP
observability_insightslist:Recommended Optimization
Tighten the
contribution-check.mdprompt so the parent agent passesissue_numberthrough into theadd_commentemit.Concretely, edit
.github/workflows/contribution-check.mdaround line 181 (Step 2, "Post per-PR comments"), and re-compile the lock file. Suggested wording:Why this is the highest-impact fix in the window:
conclusion: failureand ~840k wasted tokens.add_commentrejection lets the planned daily-summarycreate_issueand itsadd_labelsgo through (currently both silently dropped after the failure).Validation Plan
agentic-workflows compileand confirmcontribution-check.lock.ymlupdates.conclusion: successon the next Contribution Check run, with the daily summary issue created andlgtm/needs-worklabels applied. Compare against today's failed run 26494267634.safe_outputsjobconclusionflips fromfailure→success;agent_output.jsonshows everyadd_commentitem carrying an integerissue_number; observability hotspotworkflow=contribution-check failures=1 runs=1disappears from the next AgentRx report.safe_output_add_comment_requires_targetincheck.json's static set so AgentRx flags any recurrence in future runs.References