Target Workflow: security-guard
Source report: #3930
Estimated cost per run (agent runs): ~$0.42
Total tokens per run (agent runs): ~478K
Cache read rate: ~85% (from effective vs actual token ratio)
Cache write rate: N/A (token_usage_summary null — api-proxy caching not instrumented)
LLM turns (agent runs): avg 7.5 (range: 4–11)
Model: claude-sonnet-4-5
Frequency: Every PR — 23 runs in 7 days
Note: 9 of 23 runs (39%) short-circuit before the agent starts (no security-relevant files changed). Token stats above cover only the 13 runs where the agent executed.
Current Configuration
| Setting |
Value |
| Tools loaded |
github: (toolsets: pull_requests, repos) — ~6 tools |
| Tools actually used |
bash (git/gh CLI), Write, Read, safeoutputs — MCP github tools not observed in usage data |
| Network groups |
github only |
| Pre-agent steps |
✅ Yes — check_security_relevance job + PR diff fetch |
| Prompt body size |
~3,700 chars (~925 tokens) |
| Frontmatter / steps |
~2,970 chars (~740 tokens) |
max-turns |
10 |
| Diff limit |
100 KB |
Key Finding: Agent Ignores Pre-fetched Diff
The steps: section pre-fetches up to 100 KB of PR diff and injects it into the prompt as ${{ steps.pr-diff.outputs.PR_FILES }}. However, the tool usage data shows the agent still makes sequential gh pr diff, git fetch, git diff, and gh api calls — wasting 2–4 extra turns per run re-fetching data that's already in the prompt.
This is the primary driver of high turn counts (avg 7.5 vs an expected 2–3).
Recommendations
1. Enforce Pre-fetched Data Usage — Add Anti-Redundancy Instruction
Estimated savings: ~150–250K tokens/run (~35–55%) · ~3–4 fewer turns/run
The prompt instructs "Use the pre-fetched diff below as your primary source of truth. Do NOT call gh pr diff..." but the agent regularly violates this instruction (tool usage shows bash_gh pr diff, bash_git fetch origin mai..., bash_git diff origin/main... across multiple runs).
Fix: Move the restriction earlier in the prompt and make it a hard constraint at the top of "Your Task", before the numbered list:
## Your Task
> ⛔ **STOP: The full PR diff is pre-loaded at the bottom of this prompt under "Changed Files".
> Do NOT call `gh pr diff`, `git diff`, `git fetch`, or `gh api .../files` — those calls are
> redundant and waste turns. All the data you need is already here.**
Analyze PR #${{ github.event.pull_request.number }} ...
Also add a defensive step that writes the diff to a temp file so the agent can cat it without any API call:
- name: Write diff to temp file
id: write-diff
run: |
mkdir -p /tmp/gh-aw/agent
printf '%s' "$PR_FILES" > /tmp/gh-aw/agent/pr-diff.txt
echo "diff_path=/tmp/gh-aw/agent/pr-diff.txt" >> "$GITHUB_OUTPUT"
env:
PR_FILES: ${{ steps.pr-diff.outputs.PR_FILES }}
Then update the prompt to reference cat /tmp/gh-aw/agent/pr-diff.txt rather than an interpolated variable.
2. Reduce max-turns from 10 to 5
Estimated savings: ~100–200K tokens/run (~20–40%) on high-turn runs
Turn distribution for runs with token data:
- 4 turns: 2 runs (avg 235K tokens)
- 6–7 turns: 5 runs (avg 428K tokens)
- 8+ turns: 6 runs (avg 600K tokens)
Runs hitting 8–11 turns cost 2× more than 4-turn runs. A security review of a PR diff should not require 11 turns. Setting max-turns: 5 caps runaway cases.
engine:
id: claude
model: claude-sonnet-4-5
max-turns: 5 # was 10
3. Remove Unused github: MCP Toolset
Estimated savings: ~3–6K tokens/turn (~6–12K tokens/run)
The MCP tool usage data shows only safeoutputs tools are called — add_comment, add_labels, noop. No mcp__github__* calls are observed across 50 runs. The agent uses bash gh CLI commands instead, making the MCP toolset dead weight loaded into every turn's context.
tools:
# Remove entirely — agent uses bash gh CLI, not MCP tools
# github:
# mode: gh-proxy
# toolsets: [pull_requests, repos]
4. Trim Verbose "Your Task" Instructions
Estimated savings: ~200–400 tokens/run (small but compound across turns)
The "Your Task" section has 6 detailed instructions, several redundant with "Output Format". Trim to 3 key points:
## Your Task
⛔ The PR diff is pre-loaded below — do NOT re-fetch it via `gh pr diff`, `git diff`, or `gh api`.
1. Read the pre-fetched diff under "Changed Files"
2. Batch any additional file reads in a single tool call
3. Report findings (≤ 150 words each, max 5) or call `safeoutputs noop` if clean
5. Reduce PR Diff Limit from 100 KB to 50 KB
Estimated savings: ~6K tokens/turn for large PRs (~30K tokens on multi-turn runs)
The 100 KB diff limit injects up to ~25K tokens into every turn of the context window. Security-critical files are rarely changed in bulk — 50 KB is sufficient for targeted security reviews.
- name: Fetch PR changed files
run: |
DIFF_LIMIT=50000 # was 100000
Cache Analysis
Cache data is unavailable (token_usage_summary: null for all runs) — the api-proxy sidecar does not currently instrument Anthropic's cache headers. However, based on effective vs billed token ratios:
| Run |
Tokens (billed) |
Effective Tokens |
Implied Cache Rate |
| 26489859337 |
449K |
2,627K |
~83% |
| 26489709490 |
438K |
3,101K |
~86% |
| 26489579226 |
236K |
2,424K |
~90% |
Cache hit rate is high (~85–90%), meaning the static system prompt is being cached effectively within runs. The cost driver is new input tokens per turn (tool results, redundant diff re-reads), not cache misses.
Action: Enable token_usage_summary instrumentation in the api-proxy to get precise cache write vs read breakdown.
Expected Impact
| Metric |
Current |
Projected |
Savings |
| Total tokens/run |
~478K |
~200–280K |
~40–58% |
| Cost/run |
~$0.42 |
~$0.18–$0.25 |
~40–57% |
| LLM turns |
avg 7.5 |
avg 3–4 |
~45–55% fewer |
| Weekly cost (Security Guard) |
~$5.52 |
~$2.50–3.30 |
~$2.20–3.00 saved |
Largest single win: fixing redundant diff fetching (Rec #1) accounts for ~3–4 turns × ~65K tokens/turn = ~200K tokens saved per agent run.
Implementation Checklist
Generated by Daily Claude Token Optimization Advisor · sonnet46 1.5M · ◷
Target Workflow:
security-guardSource report: #3930
Estimated cost per run (agent runs): ~$0.42
Total tokens per run (agent runs): ~478K
Cache read rate: ~85% (from effective vs actual token ratio)
Cache write rate: N/A (token_usage_summary null — api-proxy caching not instrumented)
LLM turns (agent runs): avg 7.5 (range: 4–11)
Model: claude-sonnet-4-5
Frequency: Every PR — 23 runs in 7 days
Current Configuration
github:(toolsets: pull_requests, repos) — ~6 toolsbash(git/gh CLI),Write,Read,safeoutputs— MCP github tools not observed in usage datagithubonlycheck_security_relevancejob + PR diff fetchmax-turnsKey Finding: Agent Ignores Pre-fetched Diff
The
steps:section pre-fetches up to 100 KB of PR diff and injects it into the prompt as${{ steps.pr-diff.outputs.PR_FILES }}. However, the tool usage data shows the agent still makes sequentialgh pr diff,git fetch,git diff, andgh apicalls — wasting 2–4 extra turns per run re-fetching data that's already in the prompt.This is the primary driver of high turn counts (avg 7.5 vs an expected 2–3).
Recommendations
1. Enforce Pre-fetched Data Usage — Add Anti-Redundancy Instruction
Estimated savings: ~150–250K tokens/run (~35–55%) · ~3–4 fewer turns/run
The prompt instructs "Use the pre-fetched diff below as your primary source of truth. Do NOT call
gh pr diff..." but the agent regularly violates this instruction (tool usage showsbash_gh pr diff,bash_git fetch origin mai...,bash_git diff origin/main...across multiple runs).Fix: Move the restriction earlier in the prompt and make it a hard constraint at the top of "Your Task", before the numbered list:
Also add a defensive step that writes the diff to a temp file so the agent can
catit without any API call:Then update the prompt to reference
cat /tmp/gh-aw/agent/pr-diff.txtrather than an interpolated variable.2. Reduce
max-turnsfrom 10 to 5Estimated savings: ~100–200K tokens/run (~20–40%) on high-turn runs
Turn distribution for runs with token data:
Runs hitting 8–11 turns cost 2× more than 4-turn runs. A security review of a PR diff should not require 11 turns. Setting
max-turns: 5caps runaway cases.3. Remove Unused
github:MCP ToolsetEstimated savings: ~3–6K tokens/turn (~6–12K tokens/run)
The MCP tool usage data shows only
safeoutputstools are called —add_comment,add_labels,noop. Nomcp__github__*calls are observed across 50 runs. The agent uses bashghCLI commands instead, making the MCP toolset dead weight loaded into every turn's context.4. Trim Verbose "Your Task" Instructions
Estimated savings: ~200–400 tokens/run (small but compound across turns)
The "Your Task" section has 6 detailed instructions, several redundant with "Output Format". Trim to 3 key points:
5. Reduce PR Diff Limit from 100 KB to 50 KB
Estimated savings: ~6K tokens/turn for large PRs (~30K tokens on multi-turn runs)
The 100 KB diff limit injects up to ~25K tokens into every turn of the context window. Security-critical files are rarely changed in bulk — 50 KB is sufficient for targeted security reviews.
Cache Analysis
Cache data is unavailable (
token_usage_summary: nullfor all runs) — the api-proxy sidecar does not currently instrument Anthropic's cache headers. However, based on effective vs billed token ratios:Cache hit rate is high (~85–90%), meaning the static system prompt is being cached effectively within runs. The cost driver is new input tokens per turn (tool results, redundant diff re-reads), not cache misses.
Action: Enable
token_usage_summaryinstrumentation in the api-proxy to get precise cache write vs read breakdown.Expected Impact
Largest single win: fixing redundant diff fetching (Rec #1) accounts for ~3–4 turns × ~65K tokens/turn = ~200K tokens saved per agent run.
Implementation Checklist
security-guard.mdwrite-diffpre-agent step to write diff to/tmp/gh-aw/agent/pr-diff.txtmax-turns: 5(was 10)github:MCP toolset (verify no regression)gh aw compile .github/workflows/security-guard.mdnpx tsx scripts/ci/postprocess-smoke-workflows.tstoken_usage_summaryin api-proxy for cache instrumentation