⚡ Claude Token Optimization2026-05-27 — security-guard

## Target Workflow: `security-guard`

**Source report:** #3930
**Estimated cost per run (agent runs):** ~$0.42
**Total tokens per run (agent runs):** ~478K
**Cache read rate:** ~85% (from effective vs actual token ratio)
**Cache write rate:** N/A (token_usage_summary null — api-proxy caching not instrumented)
**LLM turns (agent runs):** avg 7.5 (range: 4–11)
**Model:** claude-sonnet-4-5
**Frequency:** Every PR — 23 runs in 7 days

> **Note:** 9 of 23 runs (39%) short-circuit before the agent starts (no security-relevant files changed). Token stats above cover only the 13 runs where the agent executed.

## Current Configuration

| Setting | Value |
|---------|-------|
| Tools loaded | `github:` (toolsets: pull_requests, repos) — ~6 tools |
| Tools actually used | `bash` (git/gh CLI), `Write`, `Read`, `safeoutputs` — MCP github tools **not observed** in usage data |
| Network groups | `github` only |
| Pre-agent steps | ✅ Yes — `check_security_relevance` job + PR diff fetch |
| Prompt body size | ~3,700 chars (~925 tokens) |
| Frontmatter / steps | ~2,970 chars (~740 tokens) |
| `max-turns` | 10 |
| Diff limit | 100 KB |

## Key Finding: Agent Ignores Pre-fetched Diff

The `steps:` section pre-fetches up to 100 KB of PR diff and injects it into the prompt as `${{ steps.pr-diff.outputs.PR_FILES }}`. However, the tool usage data shows the agent still makes sequential `gh pr diff`, `git fetch`, `git diff`, and `gh api` calls — wasting 2–4 extra turns per run re-fetching data that's already in the prompt.

This is the primary driver of high turn counts (avg 7.5 vs an expected 2–3).

## Recommendations

### 1. Enforce Pre-fetched Data Usage — Add Anti-Redundancy Instruction

**Estimated savings:** ~150–250K tokens/run (~35–55%) · ~3–4 fewer turns/run

The prompt instructs "Use the pre-fetched diff below as your primary source of truth. Do NOT call `gh pr diff`..." but the agent regularly violates this instruction (tool usage shows `bash_gh pr diff`, `bash_git fetch origin mai...`, `bash_git diff origin/main...` across multiple runs).

**Fix:** Move the restriction **earlier in the prompt** and make it a hard constraint at the top of "Your Task", before the numbered list:

```markdown
## Your Task

> ⛔ **STOP: The full PR diff is pre-loaded at the bottom of this prompt under "Changed Files".
> Do NOT call `gh pr diff`, `git diff`, `git fetch`, or `gh api .../files` — those calls are
> redundant and waste turns. All the data you need is already here.**

Analyze PR #${{ github.event.pull_request.number }} ...
```

Also add a defensive step that writes the diff to a temp file so the agent can `cat` it without any API call:

```yaml
- name: Write diff to temp file
  id: write-diff
  run: |
    mkdir -p /tmp/gh-aw/agent
    printf '%s' "$PR_FILES" > /tmp/gh-aw/agent/pr-diff.txt
    echo "diff_path=/tmp/gh-aw/agent/pr-diff.txt" >> "$GITHUB_OUTPUT"
  env:
    PR_FILES: ${{ steps.pr-diff.outputs.PR_FILES }}
```

Then update the prompt to reference `cat /tmp/gh-aw/agent/pr-diff.txt` rather than an interpolated variable.

### 2. Reduce `max-turns` from 10 to 5

**Estimated savings:** ~100–200K tokens/run (~20–40%) on high-turn runs

Turn distribution for runs with token data:
- 4 turns: 2 runs (avg 235K tokens)
- 6–7 turns: 5 runs (avg 428K tokens)
- 8+ turns: 6 runs (avg 600K tokens)

Runs hitting 8–11 turns cost 2× more than 4-turn runs. A security review of a PR diff should not require 11 turns. Setting `max-turns: 5` caps runaway cases.

```yaml
engine:
  id: claude
  model: claude-sonnet-4-5
  max-turns: 5   # was 10
```

### 3. Remove Unused `github:` MCP Toolset

**Estimated savings:** ~3–6K tokens/turn (~6–12K tokens/run)

The MCP tool usage data shows only `safeoutputs` tools are called — `add_comment`, `add_labels`, `noop`. No `mcp__github__*` calls are observed across 50 runs. The agent uses bash `gh` CLI commands instead, making the MCP toolset dead weight loaded into every turn's context.

```yaml
tools:
  # Remove entirely — agent uses bash gh CLI, not MCP tools
  # github:
  #   mode: gh-proxy
  #   toolsets: [pull_requests, repos]
```

### 4. Trim Verbose "Your Task" Instructions

**Estimated savings:** ~200–400 tokens/run (small but compound across turns)

The "Your Task" section has 6 detailed instructions, several redundant with "Output Format". Trim to 3 key points:

```markdown
## Your Task

⛔ The PR diff is pre-loaded below — do NOT re-fetch it via `gh pr diff`, `git diff`, or `gh api`.

1. Read the pre-fetched diff under "Changed Files"
2. Batch any additional file reads in a single tool call
3. Report findings (≤ 150 words each, max 5) or call `safeoutputs noop` if clean
```

### 5. Reduce PR Diff Limit from 100 KB to 50 KB

**Estimated savings:** ~6K tokens/turn for large PRs (~30K tokens on multi-turn runs)

The 100 KB diff limit injects up to ~25K tokens into every turn of the context window. Security-critical files are rarely changed in bulk — 50 KB is sufficient for targeted security reviews.

```yaml
- name: Fetch PR changed files
  run: |
    DIFF_LIMIT=50000   # was 100000
```

## Cache Analysis

Cache data is unavailable (`token_usage_summary: null` for all runs) — the api-proxy sidecar does not currently instrument Anthropic's cache headers. However, based on effective vs billed token ratios:

| Run | Tokens (billed) | Effective Tokens | Implied Cache Rate |
|-----|----------------:|-----------------:|-------------------:|
| 26489859337 | 449K | 2,627K | ~83% |
| 26489709490 | 438K | 3,101K | ~86% |
| 26489579226 | 236K | 2,424K | ~90% |

Cache hit rate is high (~85–90%), meaning the static system prompt is being cached effectively within runs. The cost driver is **new input tokens per turn** (tool results, redundant diff re-reads), not cache misses.

**Action:** Enable `token_usage_summary` instrumentation in the api-proxy to get precise cache write vs read breakdown.

## Expected Impact

| Metric | Current | Projected | Savings |
|--------|---------|-----------|---------|
| Total tokens/run | ~478K | ~200–280K | ~40–58% |
| Cost/run | ~$0.42 | ~$0.18–$0.25 | ~40–57% |
| LLM turns | avg 7.5 | avg 3–4 | ~45–55% fewer |
| Weekly cost (Security Guard) | ~$5.52 | ~$2.50–3.30 | ~$2.20–3.00 saved |

Largest single win: fixing redundant diff fetching (Rec #1) accounts for ~3–4 turns × ~65K tokens/turn = ~200K tokens saved per agent run.

## Implementation Checklist

- [ ] Add ⛔ anti-redundancy guard at top of "Your Task" section in `security-guard.md`
- [ ] Add `write-diff` pre-agent step to write diff to `/tmp/gh-aw/agent/pr-diff.txt`
- [ ] Set `max-turns: 5` (was 10)
- [ ] Remove `github:` MCP toolset (verify no regression)
- [ ] Trim "Your Task" from 6 points to 3
- [ ] Reduce diff limit from 100 KB to 50 KB
- [ ] Recompile: `gh aw compile .github/workflows/security-guard.md`
- [ ] Post-process: `npx tsx scripts/ci/postprocess-smoke-workflows.ts`
- [ ] Verify CI passes on a PR with security-relevant changes
- [ ] Compare token usage on new run vs baseline (target: ≤ 280K tokens/run)
- [ ] Investigate enabling `token_usage_summary` in api-proxy for cache instrumentation




> Generated by [Daily Claude Token Optimization Advisor](https://github.com/github/gh-aw-firewall/actions/runs/26503204664) · sonnet46 1.5M · [◷](https://github.com/search?q=repo%3Agithub%2Fgh-aw-firewall+is%3Aissue+%22gh-aw-workflow-call-id%3A+github%2Fgh-aw-firewall%2Fclaude-token-optimizer%22&type=issues)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

⚡ Claude Token Optimization2026-05-27 — security-guard #3932

Target Workflow: `security-guard`

Current Configuration

Key Finding: Agent Ignores Pre-fetched Diff

Recommendations

1. Enforce Pre-fetched Data Usage — Add Anti-Redundancy Instruction

2. Reduce `max-turns` from 10 to 5

3. Remove Unused `github:` MCP Toolset

4. Trim Verbose "Your Task" Instructions

5. Reduce PR Diff Limit from 100 KB to 50 KB

Cache Analysis

Expected Impact

Implementation Checklist

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Setting	Value
Tools loaded	`github:` (toolsets: pull_requests, repos) — ~6 tools
Tools actually used	`bash` (git/gh CLI), `Write`, `Read`, `safeoutputs` — MCP github tools not observed in usage data
Network groups	`github` only
Pre-agent steps	✅ Yes — `check_security_relevance` job + PR diff fetch
Prompt body size	~3,700 chars (~925 tokens)
Frontmatter / steps	~2,970 chars (~740 tokens)
`max-turns`	10
Diff limit	100 KB

Run	Tokens (billed)	Effective Tokens	Implied Cache Rate
26489859337	449K	2,627K	~83%
26489709490	438K	3,101K	~86%
26489579226	236K	2,424K	~90%

Metric	Current	Projected	Savings
Total tokens/run	~478K	~200–280K	~40–58%
Cost/run	~$0.42	~$0.18–$0.25	~40–57%
LLM turns	avg 7.5	avg 3–4	~45–55% fewer
Weekly cost (Security Guard)	~$5.52	~$2.50–3.30	~$2.20–3.00 saved

⚡ Claude Token Optimization2026-05-27 — security-guard #3932

Description

Target Workflow: security-guard

Current Configuration

Key Finding: Agent Ignores Pre-fetched Diff

Recommendations

1. Enforce Pre-fetched Data Usage — Add Anti-Redundancy Instruction

2. Reduce max-turns from 10 to 5

3. Remove Unused github: MCP Toolset

4. Trim Verbose "Your Task" Instructions

5. Reduce PR Diff Limit from 100 KB to 50 KB

Cache Analysis

Expected Impact

Implementation Checklist

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Target Workflow: `security-guard`

2. Reduce `max-turns` from 10 to 5

3. Remove Unused `github:` MCP Toolset