From 934b81c3259d9fc73c6fc98b3c0ff6aab2f6c666 Mon Sep 17 00:00:00 2001 From: Trevin Chow Date: Thu, 26 Mar 2026 19:14:38 -0700 Subject: [PATCH 1/3] feat: add project-standards-reviewer as always-on ce:review persona Adds a new always-on reviewer that audits diffs against the project's own CLAUDE.md and AGENTS.md standards -- frontmatter rules, reference inclusion, naming conventions, cross-platform portability, and tool selection policies. Inspired by Anthropic's code-review command pattern where CLAUDE.md compliance is a first-class review lens. The orchestrator discovers standards file paths via glob and passes them to the reviewer, which reads only the sections relevant to the changed file types. Also documents the "pass paths, not content" orchestration pattern as a learning in docs/solutions/ and a best practice in AGENTS.md. --- ...ths-not-content-to-subagents-2026-03-26.md | 79 ++++++++++++++++++ plugins/compound-engineering/AGENTS.md | 4 + plugins/compound-engineering/README.md | 1 + .../review/project-standards-reviewer.md | 80 +++++++++++++++++++ .../skills/ce-review/SKILL.md | 18 ++++- .../ce-review/references/persona-catalog.md | 7 +- 6 files changed, 183 insertions(+), 6 deletions(-) create mode 100644 docs/solutions/skill-design/pass-paths-not-content-to-subagents-2026-03-26.md create mode 100644 plugins/compound-engineering/agents/review/project-standards-reviewer.md diff --git a/docs/solutions/skill-design/pass-paths-not-content-to-subagents-2026-03-26.md b/docs/solutions/skill-design/pass-paths-not-content-to-subagents-2026-03-26.md new file mode 100644 index 000000000..3f2c1ff8f --- /dev/null +++ b/docs/solutions/skill-design/pass-paths-not-content-to-subagents-2026-03-26.md @@ -0,0 +1,79 @@ +--- +title: "Pass paths, not content, when dispatching sub-agents" +problem_type: best_practice +component: tooling +root_cause: inadequate_documentation +resolution_type: workflow_improvement +severity: medium +tags: [orchestration, subagent, token-efficiency, skill-design, multi-agent] +date: 2026-03-26 +--- + +## Problem + +When orchestrating sub-agents that need codebase reference material (config files, standards docs, etc.), passing full file contents in the sub-agent prompt bloats context and makes the orchestrator do expensive upfront work that may go unused. + +## Symptoms + +- Orchestrator skill reads multiple files, concatenates their contents into a block (e.g., `` with full CLAUDE.md/AGENTS.md content), and injects it into the sub-agent prompt +- Sub-agent receives all content regardless of how much is relevant to its specific task +- In repos with directory-scoped config files, the orchestrator must discover and read every file before invoking a single sub-agent +- Sub-agent prompts grow linearly with the number of reference files, even when the agent needs only specific sections + +## What Didn't Work + +Having the orchestrator read all relevant file contents and pass them in a content block. This was the initial approach for the `project-standards-reviewer` agent in ce:review: Stage 3b collected all CLAUDE.md/AGENTS.md content into a `` block passed in the sub-agent prompt. + +Problems: +- Orchestrator did expensive read work that may be partially wasted +- Sub-agent prompt inflated with content it may not fully use +- Scales poorly as the number of directory-scoped config files grows +- Sub-agent loses agency to decide what's relevant + +## Solution + +Separate discovery (cheap) from reading (expensive). The orchestrator discovers file paths via glob or search, passes a path list, and the sub-agent reads only the files and sections it needs. + +**Pattern from Anthropic's code-review command:** + +> "Use another Haiku agent to give you a list of file paths to (but not the contents of) any relevant CLAUDE.md files from the codebase: the root CLAUDE.md file (if one exists), as well as any CLAUDE.md files in the directories whose files the pull request modified" + +The reviewing agents then receive those paths and read the files themselves. + +**How we applied it in ce:review:** + +1. Stage 3b: orchestrator globs for CLAUDE.md/AGENTS.md paths in changed directories, emits a `` block +2. Sub-agent prompt: `project-standards-reviewer` reads the listed files itself, targeting sections relevant to the changed file types +3. Standalone fallback: if no `` block is present, the agent discovers paths independently + +**General template:** + +``` +Orchestrator: +1. Discover paths (glob/search) -> emit block +2. Pass path list to sub-agent + +Sub-agent: +1. If present, read listed files +2. If absent, discover paths independently (standalone fallback) +3. Read only sections relevant to the specific task +``` + +## Why This Works + +Discovery is cheap; reading and processing file contents is expensive. The sub-agent is closer to the task (it knows what it's reviewing) and is better positioned to decide which sections of which files are relevant. This is lazy evaluation applied to agent orchestration: don't pay the cost of reading until you know you need the content. + +## Prevention + +When designing orchestrator skills that invoke sub-agents needing repo reference material: + +1. **Default to path-passing.** Orchestrator discovers paths, sub-agent reads content. +2. **Include a standalone fallback.** If the paths block is absent, the sub-agent discovers paths on its own. This enables both orchestrated and standalone invocation. +3. **Content-passing is acceptable when:** the reference material is small, static, and guaranteed to be fully consumed by every invocation (e.g., a JSON schema under 50 lines that the sub-agent always needs in full). +4. **Signal to refactor:** if you catch an orchestrator reading file contents before invoking sub-agents, treat it as a candidate for the path-passing pattern. + +## Related + +- `docs/solutions/skill-design/compound-refresh-skill-improvements.md` — establishes "no shell commands for file operations in subagents"; complementary pattern about letting sub-agents use appropriate tools rather than orchestrating reads on their behalf +- `docs/solutions/skill-design/script-first-skill-architecture.md` — complementary pattern: scripts pre-process large datasets so orchestrators don't load raw data +- `docs/solutions/agent-friendly-cli-principles.md` — Principle #7 (Bounded, High-Signal Responses) reinforces that agents pay real cost for extra output; paths are bounded, content is not diff --git a/plugins/compound-engineering/AGENTS.md b/plugins/compound-engineering/AGENTS.md index fe6d804b3..f8641ecb4 100644 --- a/plugins/compound-engineering/AGENTS.md +++ b/plugins/compound-engineering/AGENTS.md @@ -127,6 +127,10 @@ Why: shell-heavy exploration causes avoidable permission prompts in sub-agent wo - [ ] Do not encode shell recipes for routine exploration when native tools can do the job; encode intent and preferred tool classes instead - [ ] For shell-only workflows (e.g., `gh`, `git`, `bundle show`, project CLIs), explicit command examples are acceptable when they are simple, task-scoped, and not chained together +### Passing Reference Material to Sub-Agents + +When a skill orchestrates sub-agents that need codebase reference material, prefer passing file paths over file contents. The sub-agent reads only what it needs. Content-passing is fine for small, static material consumed in full (e.g., a JSON schema under ~50 lines). + ### Quick Validation Command ```bash diff --git a/plugins/compound-engineering/README.md b/plugins/compound-engineering/README.md index e466e486e..0ea79cb6d 100644 --- a/plugins/compound-engineering/README.md +++ b/plugins/compound-engineering/README.md @@ -126,6 +126,7 @@ Agents are specialized subagents invoked by skills — you typically don't call | `security-reviewer` | Exploitable vulnerabilities with confidence calibration | | `security-sentinel` | Security audits and vulnerability assessments | | `testing-reviewer` | Test coverage gaps, weak assertions | +| `project-standards-reviewer` | CLAUDE.md and AGENTS.md compliance | ### Document Review diff --git a/plugins/compound-engineering/agents/review/project-standards-reviewer.md b/plugins/compound-engineering/agents/review/project-standards-reviewer.md new file mode 100644 index 000000000..6900dc46d --- /dev/null +++ b/plugins/compound-engineering/agents/review/project-standards-reviewer.md @@ -0,0 +1,80 @@ +--- +name: project-standards-reviewer +description: Always-on code-review persona. Audits changes against the project's own CLAUDE.md and AGENTS.md standards -- frontmatter rules, reference inclusion, naming conventions, cross-platform portability, and tool selection policies. +model: inherit +tools: Read, Grep, Glob, Bash +color: blue + +--- + +# Project Standards Reviewer + +You audit code changes against the project's own standards files -- CLAUDE.md, AGENTS.md, and any directory-scoped equivalents. Your job is to catch violations of rules the project has explicitly written down, not to invent new rules or apply generic best practices. Every finding you report must cite a specific rule from a specific standards file. + +## Standards discovery + +The orchestrator passes a `` block listing the file paths of all relevant CLAUDE.md and AGENTS.md files. These include root-level files plus any found in ancestor directories of changed files (a standards file in a parent directory governs everything below it). Read those files to obtain the review criteria. + +If no `` block is present (standalone usage), discover the paths yourself: + +1. Use the native file-search/glob tool to find all `CLAUDE.md` and `AGENTS.md` files in the repository. +2. For each changed file, check its ancestor directories up to the repo root for standards files. A file like `plugins/compound-engineering/AGENTS.md` applies to all changes under `plugins/compound-engineering/`. +3. Read each relevant standards file found. + +In either case, identify which sections apply to the file types in the diff. A skill compliance checklist does not apply to a TypeScript converter change. A commit convention section does not apply to a markdown content change. Match rules to the files they govern. + +## What you're hunting for + +- **YAML frontmatter violations** -- missing required fields (`name`, `description`), description values that don't follow the stated format ("what it does and when to use it"), names that don't match directory names. The standards files define what frontmatter must contain; check each changed skill or agent file against those requirements. + +- **Reference file inclusion mistakes** -- markdown links (`[file](./references/file.md)`) used for reference files where the standards require backtick paths or `@` inline inclusion. Backtick paths used for files the standards say should be `@`-inlined (small structural files under ~150 lines). `@` includes used for files the standards say should be backtick paths (large files, executable scripts). The standards file specifies which mode to use and why; cite the relevant rule. + +- **Broken cross-references** -- agent names that are not fully qualified (e.g., `learnings-researcher` instead of `compound-engineering:research:learnings-researcher`). Skill-to-skill references using slash syntax inside a SKILL.md where the standards say to use semantic wording. References to tools by platform-specific names without naming the capability class. + +- **Cross-platform portability violations** -- platform-specific tool names used without equivalents (e.g., `TodoWrite` instead of `TaskCreate`/`TaskUpdate`/`TaskList`). Slash references in pass-through SKILL.md files that won't be remapped. Assumptions about tool availability that break on other platforms. + +- **Tool selection violations in agent and skill content** -- shell commands (`find`, `ls`, `cat`, `head`, `tail`, `grep`, `rg`, `wc`, `tree`) instructed for routine file discovery, content search, or file reading where the standards require native tool usage. Chained shell commands (`&&`, `||`, `;`) or error suppression (`2>/dev/null`, `|| true`) where the standards say to use one simple command at a time. + +- **Naming and structure violations** -- files placed in the wrong directory category, component naming that doesn't match the stated convention, missing additions to README tables or counts when components are added or removed. + +- **Writing style violations** -- second person ("you should") where the standards require imperative/objective form. Hedge words in instructions (`might`, `could`, `consider`) that leave agent behavior undefined when the standards call for clear directives. + +- **Protected artifact violations** -- findings, suggestions, or instructions that recommend deleting or gitignoring files in paths the standards designate as protected (e.g., `docs/brainstorms/`, `docs/plans/`, `docs/solutions/`). + +## Confidence calibration + +Your confidence should be **high (0.80+)** when you can quote the specific rule from the standards file and point to the specific line in the diff that violates it. Both the rule and the violation are unambiguous. + +Your confidence should be **moderate (0.60-0.79)** when the rule exists in the standards file but applying it to this specific case requires judgment -- e.g., whether a skill description adequately "describes what it does and when to use it," or whether a file is small enough to qualify for `@` inclusion. + +Your confidence should be **low (below 0.60)** when the standards file is ambiguous about whether this constitutes a violation, or the rule might not apply to this file type. Suppress these. + +## What you don't flag + +- **Rules that don't apply to the changed file type.** Skill compliance checklist items are irrelevant when the diff is only TypeScript or test files. Commit conventions don't apply to markdown content changes. Match rules to what they govern. +- **Violations that automated checks already catch.** If `bun test` validates YAML strict parsing, or a linter enforces formatting, skip it. Focus on semantic compliance that tools miss. +- **Pre-existing violations in unchanged code.** If an existing SKILL.md already uses markdown links for references but the diff didn't touch those lines, mark it `pre_existing`. Only flag it as primary if the diff introduces or modifies the violation. +- **Generic best practices not in any standards file.** You review against the project's written rules, not industry conventions. If the standards files don't mention it, you don't flag it. +- **Opinions on the quality of the standards themselves.** The standards files are your criteria, not your review target. Do not suggest improvements to CLAUDE.md or AGENTS.md content. + +## Evidence requirements + +Every finding must include: + +1. The **exact quote or section reference** from the standards file that defines the rule being violated (e.g., "AGENTS.md, Skill Compliance Checklist: 'Do NOT use markdown links like `[filename.md](./references/filename.md)`'"). +2. The **specific line(s) in the diff** that violate the rule. + +A finding without both a cited rule and a cited violation is not a finding. Drop it. + +## Output format + +Return your findings as JSON matching the findings schema. No prose outside the JSON. + +```json +{ + "reviewer": "project-standards", + "findings": [], + "residual_risks": [], + "testing_gaps": [] +} +``` diff --git a/plugins/compound-engineering/skills/ce-review/SKILL.md b/plugins/compound-engineering/skills/ce-review/SKILL.md index a82f4f7ae..cbe71564b 100644 --- a/plugins/compound-engineering/skills/ce-review/SKILL.md +++ b/plugins/compound-engineering/skills/ce-review/SKILL.md @@ -73,7 +73,7 @@ Routing rules: ## Reviewers -13 reviewer personas in layered conditionals, plus CE-specific agents. See the persona catalog included below for the full catalog. +14 reviewer personas in layered conditionals, plus CE-specific agents. See the persona catalog included below for the full catalog. **Always-on (every review):** @@ -82,6 +82,7 @@ Routing rules: | `compound-engineering:review:correctness-reviewer` | Logic errors, edge cases, state bugs, error propagation | | `compound-engineering:review:testing-reviewer` | Coverage gaps, weak assertions, brittle tests | | `compound-engineering:review:maintainability-reviewer` | Coupling, complexity, naming, dead code, abstraction debt | +| `compound-engineering:review:project-standards-reviewer` | CLAUDE.md and AGENTS.md compliance -- frontmatter, references, naming, portability | | `compound-engineering:review:agent-native-reviewer` | Verify new features are agent-accessible | | `compound-engineering:research:learnings-researcher` | Search docs/solutions/ for past issues related to this PR | @@ -114,7 +115,7 @@ Routing rules: ## Review Scope -Every review spawns all 3 always-on personas plus the 2 CE always-on agents, then adds whichever cross-cutting and stack-specific conditionals fit the diff. The model naturally right-sizes: a small config change triggers 0 conditionals = 5 reviewers. A Rails auth feature might trigger security + reliability + kieran-rails + dhh-rails = 9 reviewers. +Every review spawns all 4 always-on personas plus the 2 CE always-on agents, then adds whichever cross-cutting and stack-specific conditionals fit the diff. The model naturally right-sizes: a small config change triggers 0 conditionals = 6 reviewers. A Rails auth feature might trigger security + reliability + kieran-rails + dhh-rails = 10 reviewers. ## Protected Artifacts @@ -324,7 +325,7 @@ Pass this to every reviewer in their spawn prompt. Intent shapes *how hard each ### Stage 3: Select reviewers -Read the diff and file list from Stage 1. The 3 always-on personas and 2 CE always-on agents are automatic. For each cross-cutting and stack-specific conditional persona in the persona catalog included below, decide whether the diff warrants it. This is agent judgment, not keyword matching. +Read the diff and file list from Stage 1. The 4 always-on personas and 2 CE always-on agents are automatic. For each cross-cutting and stack-specific conditional persona in the persona catalog included below, decide whether the diff warrants it. This is agent judgment, not keyword matching. Stack-specific personas are additive. A Rails UI change may warrant `kieran-rails` plus `julik-frontend-races`; a TypeScript API diff may warrant `kieran-typescript` plus `api-contract` and `reliability`. @@ -337,6 +338,7 @@ Review team: - correctness (always) - testing (always) - maintainability (always) +- project-standards (always) - agent-native-reviewer (always) - learnings-researcher (always) - security -- new endpoint in routes.rb accepts user-provided redirect URL @@ -348,6 +350,15 @@ Review team: This is progress reporting, not a blocking confirmation. +### Stage 3b: Discover project standards paths + +Before spawning sub-agents, find the file paths (not contents) of all relevant standards files for the `project-standards` persona. Use the native file-search/glob tool to locate: + +1. Use the native file-search tool (e.g., Glob in Claude Code) to find all `**/CLAUDE.md` and `**/AGENTS.md` in the repo. +2. Filter to those whose directory is an ancestor of at least one changed file. A standards file governs all files below it (e.g., `plugins/compound-engineering/AGENTS.md` applies to everything under `plugins/compound-engineering/`). + +Pass the resulting path list to the `project-standards` persona inside a `` block in its review context (see Stage 4). The persona reads the files itself, targeting only the sections relevant to the changed file types. This keeps the orchestrator's work cheap (path discovery only) and avoids bloating the subagent prompt with content the reviewer may not fully need. + ### Stage 4: Spawn sub-agents Spawn each selected persona reviewer as a parallel sub-agent using the subagent template included below. Each persona sub-agent receives: @@ -356,6 +367,7 @@ Spawn each selected persona reviewer as a parallel sub-agent using the subagent 2. Shared diff-scope rules from the diff-scope reference included below 3. The JSON output contract from the findings schema included below 4. Review context: intent summary, file list, diff +5. **For `project-standards` only:** the standards file path list from Stage 3b, wrapped in a `` block appended to the review context Persona sub-agents are **read-only**: they review and return structured JSON. They do not edit files or propose refactors. diff --git a/plugins/compound-engineering/skills/ce-review/references/persona-catalog.md b/plugins/compound-engineering/skills/ce-review/references/persona-catalog.md index be6dfdc27..9012f32e1 100644 --- a/plugins/compound-engineering/skills/ce-review/references/persona-catalog.md +++ b/plugins/compound-engineering/skills/ce-review/references/persona-catalog.md @@ -1,8 +1,8 @@ # Persona Catalog -13 reviewer personas organized into always-on, cross-cutting conditional, and stack-specific conditional layers, plus CE-specific agents. The orchestrator uses this catalog to select which reviewers to spawn for each review. +14 reviewer personas organized into always-on, cross-cutting conditional, and stack-specific conditional layers, plus CE-specific agents. The orchestrator uses this catalog to select which reviewers to spawn for each review. -## Always-on (3 personas + 2 CE agents) +## Always-on (4 personas + 2 CE agents) Spawned on every review regardless of diff content. @@ -13,6 +13,7 @@ Spawned on every review regardless of diff content. | `correctness` | `compound-engineering:review:correctness-reviewer` | Logic errors, edge cases, state bugs, error propagation, intent compliance | | `testing` | `compound-engineering:review:testing-reviewer` | Coverage gaps, weak assertions, brittle tests, missing edge case tests | | `maintainability` | `compound-engineering:review:maintainability-reviewer` | Coupling, complexity, naming, dead code, premature abstraction | +| `project-standards` | `compound-engineering:review:project-standards-reviewer` | CLAUDE.md and AGENTS.md compliance -- frontmatter, references, naming, cross-platform portability, tool selection | **CE agents (unstructured output, synthesized separately):** @@ -56,7 +57,7 @@ These CE-native agents provide specialized analysis beyond what the persona agen ## Selection rules -1. **Always spawn all 3 always-on personas** plus the 2 CE always-on agents. +1. **Always spawn all 4 always-on personas** plus the 2 CE always-on agents. 2. **For each cross-cutting conditional persona**, the orchestrator reads the diff and decides whether the persona's domain is relevant. This is a judgment call, not a keyword match. 3. **For each stack-specific conditional persona**, use file types and changed patterns as a starting point, then decide whether the diff actually introduces meaningful work for that reviewer. Do not spawn language-specific reviewers just because one config or generated file happens to match the extension. 4. **For CE conditional agents**, spawn when the diff includes migration files (`db/migrate/*.rb`, `db/schema.rb`) or data backfill scripts. From ba30663a8a2db2fc2a07d8503f6cd131127e6568 Mon Sep 17 00:00:00 2001 From: Trevin Chow Date: Thu, 26 Mar 2026 19:17:19 -0700 Subject: [PATCH 2/3] fix: add instruction phrasing efficiency finding to learning doc Empirical testing showed "find all X, then filter" produces 2 tool calls vs "for each item, walk and check" producing 14 in Claude Code. The right fix is writing the correct instruction in the skill itself, not adding meta-rules to AGENTS.md about how to phrase instructions. --- ...ass-paths-not-content-to-subagents-2026-03-26.md | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/docs/solutions/skill-design/pass-paths-not-content-to-subagents-2026-03-26.md b/docs/solutions/skill-design/pass-paths-not-content-to-subagents-2026-03-26.md index 3f2c1ff8f..6bb137141 100644 --- a/docs/solutions/skill-design/pass-paths-not-content-to-subagents-2026-03-26.md +++ b/docs/solutions/skill-design/pass-paths-not-content-to-subagents-2026-03-26.md @@ -72,6 +72,19 @@ When designing orchestrator skills that invoke sub-agents needing repo reference 3. **Content-passing is acceptable when:** the reference material is small, static, and guaranteed to be fully consumed by every invocation (e.g., a JSON schema under 50 lines that the sub-agent always needs in full). 4. **Signal to refactor:** if you catch an orchestrator reading file contents before invoking sub-agents, treat it as a candidate for the path-passing pattern. +## Instruction phrasing matters more than meta-rules + +Empirical testing showed that how the skill phrases a search instruction has a dramatic effect on tool call count. For the same task (find ancestor CLAUDE.md/AGENTS.md files for changed paths): + +| Instruction phrasing | Claude Code tool calls | Codex shell commands | +|---|---|---| +| "for each changed file, walk its ancestor directories and check for X at each level" | 14 | 2 | +| "find all X in the repo, then filter to ancestors of changed files" | 2 | 2 | + +The "per-item walk" phrasing caused Claude Code to glob each directory level individually. The "bulk find, then filter" phrasing produced two globs total. Codex was resilient to both phrasings (it wrote a Python script to batch the work either way). + +The takeaway: the most effective way to enforce efficient agent behavior is to write the correct instruction directly in the skill. Meta-rules in AGENTS.md about "how to phrase instructions efficiently" are too abstract to apply consistently — the person writing the skill won't remember them at the right moment. Instead, get the phrasing right in each skill where it matters, and document the pattern here for when someone asks why. + ## Related - `docs/solutions/skill-design/compound-refresh-skill-improvements.md` — establishes "no shell commands for file operations in subagents"; complementary pattern about letting sub-agents use appropriate tools rather than orchestrating reads on their behalf From 5fec81b789a65cf360cfd9cb1530497c560888f8 Mon Sep 17 00:00:00 2001 From: Trevin Chow Date: Thu, 26 Mar 2026 19:18:06 -0700 Subject: [PATCH 3/3] fix: add CLI benchmarking technique for instruction efficiency When in doubt about instruction phrasing efficiency, test with claude -p and codex exec to compare tool call counts across platforms before committing to a phrasing in high-frequency skills. --- ...pass-paths-not-content-to-subagents-2026-03-26.md | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/docs/solutions/skill-design/pass-paths-not-content-to-subagents-2026-03-26.md b/docs/solutions/skill-design/pass-paths-not-content-to-subagents-2026-03-26.md index 6bb137141..99a973ff2 100644 --- a/docs/solutions/skill-design/pass-paths-not-content-to-subagents-2026-03-26.md +++ b/docs/solutions/skill-design/pass-paths-not-content-to-subagents-2026-03-26.md @@ -83,7 +83,17 @@ Empirical testing showed that how the skill phrases a search instruction has a d The "per-item walk" phrasing caused Claude Code to glob each directory level individually. The "bulk find, then filter" phrasing produced two globs total. Codex was resilient to both phrasings (it wrote a Python script to batch the work either way). -The takeaway: the most effective way to enforce efficient agent behavior is to write the correct instruction directly in the skill. Meta-rules in AGENTS.md about "how to phrase instructions efficiently" are too abstract to apply consistently — the person writing the skill won't remember them at the right moment. Instead, get the phrasing right in each skill where it matters, and document the pattern here for when someone asks why. +When in doubt about whether an instruction phrasing is efficient, test it empirically before committing. Both `claude -p` and `codex exec` support JSON output that reveals tool call counts: + +```bash +# Claude Code: stream-json + verbose shows each tool call +claude -p "instruction here" --output-format stream-json --verbose 2>/dev/null > out.jsonl + +# Codex: --json shows command_execution events +codex exec --json --full-auto "instruction here" > out.jsonl +``` + +This is worth doing for orchestration-heavy skills where instructions drive search or file discovery — a small phrasing change can produce a large difference in tool calls, latency, and token cost. Not every instruction needs benchmarking, but when the skill will run on every review or every plan, the cost compounds. ## Related