[agentrx-optimizer] Daily Workflow Optimization - 2026-05-27

### Executive Summary

Analysis of the most recent 30 gh-aw fleet runs (24 completed + 6 in-progress, 3.1h wall-time, 33.4M tokens) surfaced one critical safe-output contract violation that took down the daily **Contribution Check** workflow despite the agent itself running cleanly. Yesterday this same workflow finished `success` with 31 turns; today it ran 0 turns of useful work *as visible to the validator* and ended `conclusion: failure`. The blast radius is small (one workflow), but the workflow is daily-scheduled and the fix is a one-paragraph prompt clarification — easily the highest-impact change in the window.

### AgentRx Evidence

- **Critical step**: `agent.emit(add_comment)` in run [§26494267634](https://github.com/github/gh-aw/actions/runs/26494267634) (Contribution Check, agent index 1 of the safe-output stream)
- **Failure category**: tool-invocation contract violation — missing required parameter
- **Frequency / impact**: 1/1 scheduled occurrences in the 7-day window (100%); 840,632 tokens spent on an agent run whose downstream `safe_outputs` job rejected the first item and short-circuited the rest (the planned `create_issue` for the daily summary plus `add_labels` were never emitted to GitHub)
- **Representative run IDs**: 26494267634 (today, failed), 26451955941 (yesterday baseline, success)

The validator error itself is unambiguous (extracted from `workflow-logs/safe_outputs/8_Process Safe Outputs.txt`):

> `Message 1 (add_comment) failed: Target is "*" but no item_number/issue_number/pull_request_number/pr_number/pr/pull_number specified in add_comment item`

The agent emitted `{"body": "...", "type": "add_comment", "temporary_id": "aw_nCeJ1YQO"}` — no `issue_number`, even though the `comment-dispatcher` subagent at `.github/workflows/contribution-check.md:265` is documented to return `{"issue_number": <number>, "body": "<comment>"}`. The `issue_number` was produced upstream but dropped when the parent agent translated dispatcher output into the safe-output emission.

#### Labeled violations (from failure-pattern-classifier)

| ID | Severity | Workflow | Step | Pattern |
|----|----------|----------|------|---------|
| v1 | **critical** | Contribution Check | `agent.emit(add_comment)` | tool-invocation contract violation: missing required target field |
| v2 | high | PR Code Quality Reviewer | `agent.behavior.turn_count` | execution drift: 3→39 turn spread across 2 runs (13×, avg 21) |
| v3 | medium | GitHub Remote MCP Authentication Test | `container.network.egress` | firewall friction: 33% block rate (4/12) |
| v4 | medium | Step Name Alignment | `session.cache` | cache_memory_miss reported |
| v5 | info | (fleet-wide) | `fleet.event_templates` | 7 high-anomaly events (score > 0.6) across 30 runs |

<details>
<summary>AgentRx Artifacts</summary>

**Note**: AgentRx's LLM-driven stages (ir / static / dynamic / check / judge / report) require the `copilot` / `azure` / `trapi` endpoint, none of which are reachable inside the sandboxed agent firewall (only `api.anthropic.com`, `api.githubcopilot.com`, `o205451.ingest.us.sentry.io`, etc. are on the allow-list — see `firewall_log.allowed_domains`). The IR stage failed at `agentrx/llm_clients/copilot_cli.py:171` with `'copilot' CLI not found on PATH`. Per the workflow's documented guardrail ("If a later stage fails due to endpoint/auth constraints, continue with completed artifacts and still produce a grounded recommendation"), the findings below are sourced from the MCP audit data instead — specifically the per-workflow `observability_insights` and the `audit` output for run 26494267634.

| Artifact | Path | Source | Notes |
|---|---|---|---|
| `trajectory.json` | `/tmp/gh-aw/agent/agentrx/trajectory.json` | trajectory-builder | 67 turns derived from 30 runs + 7 insights |
| `state.json` | `runs/gh-aw-daily/state.json` | AgentRx | `completed_stages: []`, domain auto-detected as `flash` |
| `check.json` | `runs/gh-aw-daily/check.json` | MCP-derived fallback | 5 violations (1 critical, 1 high, 2 medium, 1 info) |
| `judge.json` | `runs/gh-aw-daily/judge.json` | MCP-derived fallback | critical step + root-cause category + candidate fix |
| IR / static / dynamic / report | — | LLM-blocked | `'copilot' CLI not found on PATH` (endpoint unreachable in firewall) |

**Top fleet-level signals from the MCP `observability_insights` list**:

- **reliability (high)**: "Workflow contribution-check accounted for 1 failure(s) across 1 run(s), a 100% failure rate."
- **drift (medium)**: "Workflow pr-code-quality-reviewer varied from 3 to 39 turns across runs ... avg 21.0 turns."
- **network (medium)**: "github-remote-mcp-authentication-test had the highest firewall block pressure with 4 blocked request(s) out of 12 total (33%)."
- **tooling (medium)**: "step-name-alignment ... 1 missing data signal(s)" (cache_memory_miss).
- **reliability (high)**: "7 high-anomaly events across 30 runs ... score > 0.6, indicating unusual patterns relative to the learned templates."

</details>

### Recommended Optimization

**Tighten the `contribution-check.md` prompt so the parent agent passes `issue_number` through into the `add_comment` emit.**

Concretely, edit `.github/workflows/contribution-check.md` around line 181 (Step 2, "Post per-PR comments"), and re-compile the lock file. Suggested wording:

> Use the `comment-dispatcher` agent on the verdict array (the JSON objects returned by the contribution-checker subagent in Step 1) to get the list of comments to post. **For each returned `{issue_number, body}` payload, emit one `add_comment` safe output that includes BOTH fields verbatim — `issue_number` is REQUIRED. The safe-output validator rejects `add_comment` items that lack `item_number` / `issue_number` / `pull_request_number` (see error: `Target is "*" but no item_number/...`) and will fail the entire `safe_outputs` job, dropping the daily summary issue and its labels.** Do not specify the repo — `target-repo` is pre-configured.

Why this is the highest-impact fix in the window:

1. **Daily-scheduled workflow → fixes recur every 24h until addressed** — every day spent unfixed is another `conclusion: failure` and ~840k wasted tokens.
2. **Smallest meaningful change** — one paragraph in one prompt, no code changes, no schema changes. The validator already enforces the rule correctly; only the prompt was permissive.
3. **Cascading recovery** — fixing the first `add_comment` rejection lets the planned daily-summary `create_issue` and its `add_labels` go through (currently both silently dropped after the failure).
4. **Other candidates have higher coordination cost**: v2 (drift) requires multi-run study, v3/v4 require network-config debate, v5 is fleet-wide and not actionable yet.

### Validation Plan

- **Pre-merge**: re-compile with `agentic-workflows compile` and confirm `contribution-check.lock.yml` updates.
- **Next scheduled run** (within 24h after merge): expect `conclusion: success` on the next Contribution Check run, with the daily summary issue created and `lgtm` / `needs-work` labels applied. Compare against today's failed run 26494267634.
- **Success metric**: `safe_outputs` job `conclusion` flips from `failure` → `success`; `agent_output.json` shows every `add_comment` item carrying an integer `issue_number`; observability hotspot `workflow=contribution-check failures=1 runs=1` disappears from the next AgentRx report.
- **Regression watch**: keep the violation rule `safe_output_add_comment_requires_target` in `check.json`'s static set so AgentRx flags any recurrence in future runs.

### References

- [§26494267634](https://github.com/github/gh-aw/actions/runs/26494267634) — failing Contribution Check run (today)
- [§26451955941](https://github.com/github/gh-aw/actions/runs/26451955941) — baseline successful Contribution Check run (yesterday)
- [§26492354589](https://github.com/github/gh-aw/actions/runs/26492354589) — PR Code Quality Reviewer (39-turn outlier; secondary v2 evidence)







> Generated by [⚡ Daily AgentRx Trace Optimizer](https://github.com/github/gh-aw/actions/runs/26495548001) · opus47 15.8M · [◷](https://github.com/search?q=repo%3Agithub%2Fgh-aw+is%3Aissue+%22gh-aw-workflow-call-id%3A+github%2Fgh-aw%2Fdaily-agentrx-trace-optimizer%22&type=issues)
> - [x] expires  on Jun 3, 2026, 6:59 AM UTC

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[agentrx-optimizer] Daily Workflow Optimization - 2026-05-27 #35151

Executive Summary

AgentRx Evidence

Labeled violations (from failure-pattern-classifier)

Recommended Optimization

Validation Plan

References

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

ID	Severity	Workflow	Step	Pattern
v1	critical	Contribution Check	`agent.emit(add_comment)`	tool-invocation contract violation: missing required target field
v2	high	PR Code Quality Reviewer	`agent.behavior.turn_count`	execution drift: 3→39 turn spread across 2 runs (13×, avg 21)
v3	medium	GitHub Remote MCP Authentication Test	`container.network.egress`	firewall friction: 33% block rate (4/12)
v4	medium	Step Name Alignment	`session.cache`	cache_memory_miss reported
v5	info	(fleet-wide)	`fleet.event_templates`	7 high-anomaly events (score > 0.6) across 30 runs

Artifact	Path	Source	Notes
`trajectory.json`	`/tmp/gh-aw/agent/agentrx/trajectory.json`	trajectory-builder	67 turns derived from 30 runs + 7 insights
`state.json`	`runs/gh-aw-daily/state.json`	AgentRx	`completed_stages: []`, domain auto-detected as `flash`
`check.json`	`runs/gh-aw-daily/check.json`	MCP-derived fallback	5 violations (1 critical, 1 high, 2 medium, 1 info)
`judge.json`	`runs/gh-aw-daily/judge.json`	MCP-derived fallback	critical step + root-cause category + candidate fix
IR / static / dynamic / report	—	LLM-blocked	`'copilot' CLI not found on PATH` (endpoint unreachable in firewall)

[agentrx-optimizer] Daily Workflow Optimization - 2026-05-27 #35151

Description

Executive Summary

AgentRx Evidence

Labeled violations (from failure-pattern-classifier)

Recommended Optimization

Validation Plan

References

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions