Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 26 additions & 16 deletions .github/workflows/claude-token-optimizer.lock.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

118 changes: 62 additions & 56 deletions .github/workflows/claude-token-optimizer.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,8 @@ permissions:
actions: read
issues: read
pull-requests: read
imports:
- uses: shared/mcp/gh-aw.md
network:
allowed:
- github
Expand All @@ -23,11 +25,39 @@ tools:
bash: true
safe-outputs:
create-issue:
title-prefix: " Claude Token Optimization"
title-prefix: "\u26a1 Claude Token Optimization"
labels: [claude-token-optimization]
close-older-issues: true
timeout-minutes: 10
strict: true
steps:
- name: Download recent Claude workflow logs
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
Comment on lines +35 to +36
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The workflow imports shared/mcp/gh-aw.md, which already sets GH_TOKEN using the GH_AW_* fallback chain, but this step overrides GH_TOKEN to secrets.GITHUB_TOKEN. If the repo relies on a higher-scope token (or runs on GHE with different requirements), this can break gh aw logs. Consider using the same fallback expression here, or omit GH_TOKEN so the component’s choice applies.

Suggested change
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}

Copilot uses AI. Check for mistakes.
run: |
set -euo pipefail
mkdir -p /tmp/gh-aw/token-audit

echo "\U0001F4E5 Downloading Claude workflow logs (last 7 days)..."

LOGS_EXIT=0
gh aw logs \
--engine claude \
--start-date -7d \
--json \
-c 50 \
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This step claims to download the last 7 days of runs, but -c 50 caps the results and can silently omit runs if more than 50 exist in the window. Consider removing the cap or making it explicit in the messaging/logic (and/or increasing the cap to match a 7-day lookback).

Suggested change
-c 50 \

Copilot uses AI. Check for mistakes.
> /tmp/gh-aw/token-audit/claude-logs.json || LOGS_EXIT=$?

if [ -s /tmp/gh-aw/token-audit/claude-logs.json ]; then
TOTAL=$(jq '.runs | length' /tmp/gh-aw/token-audit/claude-logs.json)
echo "\u2705 Downloaded $TOTAL Claude workflow runs (last 7 days)"
if [ "$LOGS_EXIT" -ne 0 ]; then
echo "\u26a0\ufe0f gh aw logs exited with code $LOGS_EXIT (partial results)"
fi
else
echo "\u274c No log data downloaded (exit code $LOGS_EXIT)"
echo '{"runs":[],"summary":{}}' > /tmp/gh-aw/token-audit/claude-logs.json
fi
---

# Daily Claude Token Optimization Advisor
Expand Down Expand Up @@ -57,7 +87,7 @@ From the report's **Workflow Summary** table, identify the workflow with:

Extract these key metrics for the target workflow:
- Total tokens per run
- Cache hit rate (read and write separately Anthropic exposes both)
- Cache hit rate (read and write separately \u2014 Anthropic exposes both)
- Cache write rate
- Input/output ratio
- Number of LLM turns (request count)
Expand All @@ -81,53 +111,32 @@ cat "$WORKFLOW_FILE"
```

Analyze:
- **Tools loaded** List all tools in the `tools:` section. Flag any that may not be needed.
- **Network groups** List network groups in `network.allowed:`. Flag unused ones.
- **Prompt length** Estimate the markdown body size. Is it verbose?
- **Pre-agent steps** Does it use `steps:` to pre-compute deterministic work?
- **Post-agent steps** Does it use `post-steps:` for validation?
- **Tools loaded** \u2014 List all tools in the `tools:` section. Flag any that may not be needed.
- **Network groups** \u2014 List network groups in `network.allowed:`. Flag unused ones.
- **Prompt length** \u2014 Estimate the markdown body size. Is it verbose?
- **Pre-agent steps** \u2014 Does it use `steps:` to pre-compute deterministic work?
- **Post-agent steps** \u2014 Does it use `post-steps:` for validation?

## Step 4: Analyze Recent Run Artifacts
## Step 4: Analyze Recent Run Data

Download the most recent successful run's artifacts to understand actual tool usage:
The pre-agent step downloaded the last 7 days of Claude workflow logs to `/tmp/gh-aw/token-audit/claude-logs.json`. Filter this data for the target workflow:

```bash
# Find the latest successful run using the resolved workflow file
LOCK_FILE="$(basename "$WORKFLOW_FILE" .md).lock.yml"
RUN_ID=$(gh run list --repo "$GITHUB_REPOSITORY" \
--workflow "$LOCK_FILE" \
--status success --limit 1 \
--json databaseId --jq '.[0].databaseId')

if [ -z "$RUN_ID" ] || [ "$RUN_ID" = "null" ]; then
echo "No successful runs found for $LOCK_FILE — skipping artifact analysis"
else
# Download artifacts
TMPDIR=$(mktemp -d)
gh run download "$RUN_ID" --repo "$GITHUB_REPOSITORY" \
--name agent-artifacts --dir "$TMPDIR" 2>/dev/null || \
gh run download "$RUN_ID" --repo "$GITHUB_REPOSITORY" \
--name agent --dir "$TMPDIR" 2>/dev/null

# Check token usage
find "$TMPDIR" -name "token-usage.jsonl" -exec cat {} \;

# Check agent stdio log for tool calls (|| true to handle no matches)
find "$TMPDIR" -name "agent-stdio.log" -exec grep -h "^●" {} \; || true

# Check prompt size
find "$TMPDIR" -name "prompt.txt" -exec wc -c {} \;
fi
# Extract runs for the target workflow
cat /tmp/gh-aw/token-audit/claude-logs.json | \
jq --arg name "$WORKFLOW_NAME" '[.runs[] | select(.workflow_name == $name)]'
```

From the artifacts, determine:
- **Which tools were actually called** vs which are loaded
- **How many LLM turns** were used
- **Per-turn token breakdown** (first turn is usually the most expensive)
- **Cache write vs cache read ratio** — Anthropic charges 12.5x more for cache writes than reads
- **Whether cache writes are amortized** — Are they reused across subsequent turns?
From the run data, determine:
- **Per-run token breakdown** (token_usage, estimated_cost per run)
- **Average turns** per run
- **Error/warning patterns**
- **Token usage summary** (per-model breakdown from `token_usage_summary` if available)
- **Cache write vs read ratio** \u2014 Anthropic charges 12.5x more for cache writes than reads

Also check the `tool_usage` and `mcp_tool_usage` fields in the JSON to identify which tools are actually being used vs loaded.

Clean up: `rm -rf "$TMPDIR"`
Clean up is not needed \u2014 data is pre-downloaded to /tmp.

## Step 5: Generate Optimization Recommendations

Expand All @@ -137,18 +146,16 @@ Produce **specific, implementable recommendations** based on these patterns:
If many tools are loaded but few are used:
- List which tools to remove from `tools:` in the workflow `.md`
- Estimate token savings (each tool schema is ~500-700 tokens)
- Example: "Remove `agentic-workflows:`, `web-fetch:` saves ~15K tokens/turn"
- Example: "Remove `agentic-workflows:`, `web-fetch:` \u2014 saves ~15K tokens/turn"

### Pre-Agent Steps
If the workflow does deterministic work (API calls, file creation, data fetching) inside the agent:
- Identify which operations could move to `steps:` (pre-agent)
- Show example `steps:` configuration
- Example: "Move `gh pr list` to a pre-step, inject results via `${{ steps.X.outputs.Y }}`"

### Prompt Optimization
If the prompt is verbose or contains data the agent doesn't need:
- Suggest specific cuts or rewrites
- Example: "Replace 15-line test instructions with 3-line summary referencing pre-computed results"

### GitHub Toolset Restriction
If `github:` tools are loaded without `toolsets:` restriction:
Expand All @@ -163,8 +170,7 @@ If unused network groups are configured (e.g., `node`, `playwright`):
Anthropic charges significantly more for cache writes than reads:
- Cache write: $3.75/M tokens (Sonnet), cache read: $0.30/M tokens
- If cache writes are high but not reused across turns, the caching cost may exceed the benefit
- Check if `cache_write_tokens` from Turn 1 are reflected as `cache_read_tokens` in Turn 2+
- If prompts change substantially between turns, caching provides no benefit
- Check if prompts change substantially between turns

### Cache Read Optimization
If cache hit rate is low (<50%):
Expand All @@ -174,7 +180,7 @@ If cache hit rate is low (<50%):

## Step 6: Create the Optimization Issue

Create an issue with title: `YYYY-MM-DD <workflow-name>`
Create an issue with title: `YYYY-MM-DD \u2014 <workflow-name>`

Body structure:

Expand Down Expand Up @@ -248,11 +254,11 @@ Body structure:

## Important Guidelines

- **Be concrete** Every recommendation must include specific file changes, not just "reduce tools"
- **Estimate savings** Quantify each recommendation in tokens and percentage
- **Prioritize by impact** Order recommendations from highest to lowest token savings
- **Include implementation steps** Someone should be able to follow your recommendations without additional research
- **Reference the report** Link back to the source token usage report issue
- **One workflow per issue** Focus on the single most expensive workflow
- **Anthropic-specific insights** Leverage cache write data that Copilot workflows don't expose
- **Clean up** temporary files after analysis
- **Be concrete** \u2014 Every recommendation must include specific file changes, not just "reduce tools"
- **Estimate savings** \u2014 Quantify each recommendation in tokens and percentage
- **Prioritize by impact** \u2014 Order recommendations from highest to lowest token savings
- **Include implementation steps** \u2014 Someone should be able to follow your recommendations without additional research
- **Reference the report** \u2014 Link back to the source token usage report issue
- **One workflow per issue** \u2014 Focus on the single most expensive workflow
- **Anthropic-specific insights** \u2014 Leverage cache write data that Copilot workflows don't expose
- **Use pre-downloaded data** \u2014 All run data is at `/tmp/gh-aw/token-audit/claude-logs.json`. Do not download artifacts manually.
Loading
Loading