Skip to content

[agentrx-optimizer] Daily Workflow Optimization - 2026-05-27 #35151

@github-actions

Description

@github-actions

Executive Summary

Analysis of the most recent 30 gh-aw fleet runs (24 completed + 6 in-progress, 3.1h wall-time, 33.4M tokens) surfaced one critical safe-output contract violation that took down the daily Contribution Check workflow despite the agent itself running cleanly. Yesterday this same workflow finished success with 31 turns; today it ran 0 turns of useful work as visible to the validator and ended conclusion: failure. The blast radius is small (one workflow), but the workflow is daily-scheduled and the fix is a one-paragraph prompt clarification — easily the highest-impact change in the window.

AgentRx Evidence

  • Critical step: agent.emit(add_comment) in run §26494267634 (Contribution Check, agent index 1 of the safe-output stream)
  • Failure category: tool-invocation contract violation — missing required parameter
  • Frequency / impact: 1/1 scheduled occurrences in the 7-day window (100%); 840,632 tokens spent on an agent run whose downstream safe_outputs job rejected the first item and short-circuited the rest (the planned create_issue for the daily summary plus add_labels were never emitted to GitHub)
  • Representative run IDs: 26494267634 (today, failed), 26451955941 (yesterday baseline, success)

The validator error itself is unambiguous (extracted from workflow-logs/safe_outputs/8_Process Safe Outputs.txt):

Message 1 (add_comment) failed: Target is "*" but no item_number/issue_number/pull_request_number/pr_number/pr/pull_number specified in add_comment item

The agent emitted {"body": "...", "type": "add_comment", "temporary_id": "aw_nCeJ1YQO"} — no issue_number, even though the comment-dispatcher subagent at .github/workflows/contribution-check.md:265 is documented to return {"issue_number": <number>, "body": "<comment>"}. The issue_number was produced upstream but dropped when the parent agent translated dispatcher output into the safe-output emission.

Labeled violations (from failure-pattern-classifier)

ID Severity Workflow Step Pattern
v1 critical Contribution Check agent.emit(add_comment) tool-invocation contract violation: missing required target field
v2 high PR Code Quality Reviewer agent.behavior.turn_count execution drift: 3→39 turn spread across 2 runs (13×, avg 21)
v3 medium GitHub Remote MCP Authentication Test container.network.egress firewall friction: 33% block rate (4/12)
v4 medium Step Name Alignment session.cache cache_memory_miss reported
v5 info (fleet-wide) fleet.event_templates 7 high-anomaly events (score > 0.6) across 30 runs
AgentRx Artifacts

Note: AgentRx's LLM-driven stages (ir / static / dynamic / check / judge / report) require the copilot / azure / trapi endpoint, none of which are reachable inside the sandboxed agent firewall (only api.anthropic.com, api.githubcopilot.com, o205451.ingest.us.sentry.io, etc. are on the allow-list — see firewall_log.allowed_domains). The IR stage failed at agentrx/llm_clients/copilot_cli.py:171 with 'copilot' CLI not found on PATH. Per the workflow's documented guardrail ("If a later stage fails due to endpoint/auth constraints, continue with completed artifacts and still produce a grounded recommendation"), the findings below are sourced from the MCP audit data instead — specifically the per-workflow observability_insights and the audit output for run 26494267634.

Artifact Path Source Notes
trajectory.json /tmp/gh-aw/agent/agentrx/trajectory.json trajectory-builder 67 turns derived from 30 runs + 7 insights
state.json runs/gh-aw-daily/state.json AgentRx completed_stages: [], domain auto-detected as flash
check.json runs/gh-aw-daily/check.json MCP-derived fallback 5 violations (1 critical, 1 high, 2 medium, 1 info)
judge.json runs/gh-aw-daily/judge.json MCP-derived fallback critical step + root-cause category + candidate fix
IR / static / dynamic / report LLM-blocked 'copilot' CLI not found on PATH (endpoint unreachable in firewall)

Top fleet-level signals from the MCP observability_insights list:

  • reliability (high): "Workflow contribution-check accounted for 1 failure(s) across 1 run(s), a 100% failure rate."
  • drift (medium): "Workflow pr-code-quality-reviewer varied from 3 to 39 turns across runs ... avg 21.0 turns."
  • network (medium): "github-remote-mcp-authentication-test had the highest firewall block pressure with 4 blocked request(s) out of 12 total (33%)."
  • tooling (medium): "step-name-alignment ... 1 missing data signal(s)" (cache_memory_miss).
  • reliability (high): "7 high-anomaly events across 30 runs ... score > 0.6, indicating unusual patterns relative to the learned templates."

Recommended Optimization

Tighten the contribution-check.md prompt so the parent agent passes issue_number through into the add_comment emit.

Concretely, edit .github/workflows/contribution-check.md around line 181 (Step 2, "Post per-PR comments"), and re-compile the lock file. Suggested wording:

Use the comment-dispatcher agent on the verdict array (the JSON objects returned by the contribution-checker subagent in Step 1) to get the list of comments to post. For each returned {issue_number, body} payload, emit one add_comment safe output that includes BOTH fields verbatim — issue_number is REQUIRED. The safe-output validator rejects add_comment items that lack item_number / issue_number / pull_request_number (see error: Target is "*" but no item_number/...) and will fail the entire safe_outputs job, dropping the daily summary issue and its labels. Do not specify the repo — target-repo is pre-configured.

Why this is the highest-impact fix in the window:

  1. Daily-scheduled workflow → fixes recur every 24h until addressed — every day spent unfixed is another conclusion: failure and ~840k wasted tokens.
  2. Smallest meaningful change — one paragraph in one prompt, no code changes, no schema changes. The validator already enforces the rule correctly; only the prompt was permissive.
  3. Cascading recovery — fixing the first add_comment rejection lets the planned daily-summary create_issue and its add_labels go through (currently both silently dropped after the failure).
  4. Other candidates have higher coordination cost: v2 (drift) requires multi-run study, v3/v4 require network-config debate, v5 is fleet-wide and not actionable yet.

Validation Plan

  • Pre-merge: re-compile with agentic-workflows compile and confirm contribution-check.lock.yml updates.
  • Next scheduled run (within 24h after merge): expect conclusion: success on the next Contribution Check run, with the daily summary issue created and lgtm / needs-work labels applied. Compare against today's failed run 26494267634.
  • Success metric: safe_outputs job conclusion flips from failuresuccess; agent_output.json shows every add_comment item carrying an integer issue_number; observability hotspot workflow=contribution-check failures=1 runs=1 disappears from the next AgentRx report.
  • Regression watch: keep the violation rule safe_output_add_comment_requires_target in check.json's static set so AgentRx flags any recurrence in future runs.

References

  • §26494267634 — failing Contribution Check run (today)
  • §26451955941 — baseline successful Contribution Check run (yesterday)
  • §26492354589 — PR Code Quality Reviewer (39-turn outlier; secondary v2 evidence)

Generated by ⚡ Daily AgentRx Trace Optimizer · opus47 15.8M ·

  • expires on Jun 3, 2026, 6:59 AM UTC

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions