Skip to content

Commit 57bc6c1

Browse files
committed
feat(.agents): add review mode to CLI agents
- Rename *-tester agents to *-cli (claude-code-cli, codex-cli, gemini-cli, codebuff-local-cli) - Add mode param (test/review) to all 4 CLI agents - Add reviewFindings output schema for structured review results - Add reviewType param to codex-cli for /review questionnaire navigation - Update prompts with detailed review criteria (code organization, over-engineering, AI slop, systems thinking)
1 parent 68d9462 commit 57bc6c1

File tree

4 files changed

+630
-73
lines changed

4 files changed

+630
-73
lines changed
Lines changed: 159 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,15 @@
11
import type { AgentDefinition } from './types/agent-definition'
22

33
const definition: AgentDefinition = {
4-
id: 'claude-code-tester',
5-
displayName: 'Claude Code Tester',
4+
id: 'claude-code-cli',
5+
displayName: 'Claude Code CLI',
66
model: 'anthropic/claude-opus-4.5',
77

8-
spawnerPrompt: `Expert at testing Claude Code CLI functionality using tmux.
8+
spawnerPrompt: `Expert at testing Claude Code CLI functionality using tmux, or performing code reviews via Claude Code.
99
10-
**What it does:** Spawns tmux sessions, sends input to Claude Code CLI, captures terminal output, and validates behavior.
10+
**Modes:**
11+
- \`test\` (default): Spawns tmux sessions, sends input to Claude Code CLI, captures terminal output, and validates behavior.
12+
- \`review\`: Uses Claude Code CLI to perform code reviews on specified files or directories.
1113
1214
**Paper trail:** Session logs are saved to \`debug/tmux-sessions/{session}/\`. Use \`read_files\` to view captures.
1315
@@ -20,7 +22,18 @@ const definition: AgentDefinition = {
2022
prompt: {
2123
type: 'string',
2224
description:
23-
'Description of what Claude Code functionality to test (e.g., "test that the help command displays correctly", "verify the CLI starts successfully")',
25+
'Description of what to do. For test mode: what CLI functionality to test. For review mode: what code to review and any specific concerns.',
26+
},
27+
params: {
28+
type: 'object',
29+
properties: {
30+
mode: {
31+
type: 'string',
32+
enum: ['test', 'review'],
33+
description:
34+
'Operation mode - "test" for CLI testing (default), "review" for code review via Claude Code',
35+
},
36+
},
2437
},
2538
},
2639

@@ -114,6 +127,38 @@ const definition: AgentDefinition = {
114127
description:
115128
'Paths to saved terminal captures for debugging - check debug/tmux-sessions/{session}/',
116129
},
130+
reviewFindings: {
131+
type: 'array',
132+
items: {
133+
type: 'object',
134+
properties: {
135+
file: {
136+
type: 'string',
137+
description: 'File path where the issue was found',
138+
},
139+
severity: {
140+
type: 'string',
141+
enum: ['critical', 'warning', 'suggestion', 'info'],
142+
description: 'Severity level of the finding',
143+
},
144+
line: {
145+
type: 'number',
146+
description: 'Line number (if applicable)',
147+
},
148+
finding: {
149+
type: 'string',
150+
description: 'Description of the issue or suggestion',
151+
},
152+
suggestion: {
153+
type: 'string',
154+
description: 'Suggested fix or improvement',
155+
},
156+
},
157+
required: ['file', 'severity', 'finding'],
158+
},
159+
description:
160+
'Code review findings (only populated in review mode)',
161+
},
117162
},
118163
required: [
119164
'overallStatus',
@@ -257,6 +302,14 @@ The capture path is printed to stderr. Both you and the parent agent can read th
257302

258303
instructionsPrompt: `Instructions:
259304
305+
Check the \`mode\` parameter to determine your operation:
306+
- If \`mode\` is "review" or the prompt mentions reviewing/analyzing code: follow **Review Mode** instructions
307+
- Otherwise: follow **Test Mode** instructions (default)
308+
309+
---
310+
311+
## Test Mode Instructions
312+
260313
1. **Use the helper scripts** in \`scripts/tmux/\` - they handle bracketed paste mode automatically
261314
262315
2. **Start a Claude Code test session** with permission bypass:
@@ -286,22 +339,110 @@ The capture path is printed to stderr. Both you and the parent agent can read th
286339
./scripts/tmux/tmux-cli.sh capture "$SESSION" --label "after-help-command" --wait 2
287340
\`\`\`
288341
289-
7. **Report results using set_output** - You MUST call set_output with structured results:
290-
- \`overallStatus\`: "success", "failure", or "partial"
291-
- \`summary\`: Brief description of what was tested
292-
- \`testResults\`: Array of test outcomes with testName, passed (boolean), details, capturedOutput
293-
- \`scriptIssues\`: Array of any problems with the helper scripts (IMPORTANT for the parent agent!)
294-
- \`captures\`: Array of capture paths with labels (e.g., {path: "debug/tmux-sessions/tui-test-123/capture-...", label: "after-help"})
342+
---
343+
344+
## Review Mode Instructions
345+
346+
In review mode, you send a detailed review prompt to Claude Code. The prompt MUST start with the word "review" and include specific areas of concern.
347+
348+
### What We're Looking For
349+
350+
The review should focus on these key areas:
351+
352+
1. **Code Organization Issues**
353+
- Poor file/module structure
354+
- Unclear separation of concerns
355+
- Functions/classes that do too many things
356+
- Missing or inconsistent abstractions
357+
358+
2. **Over-Engineering & Complexity**
359+
- Unnecessarily abstract or generic code
360+
- Premature optimization
361+
- Complex patterns where simple solutions would suffice
362+
- "Enterprise" patterns in small codebases
363+
364+
3. **AI-Generated Code Patterns ("AI Slop")**
365+
- Verbose, flowery language in comments ("It's important to note...", "Worth mentioning...")
366+
- Excessive disclaimers and hedging in documentation
367+
- Inconsistent coding style within the same file
368+
- Overly generic variable/function names
369+
- Redundant explanatory comments that just restate the code
370+
- Sudden shifts between formal and casual tone
371+
- Filler phrases that add no value
372+
373+
4. **Lack of Systems-Level Thinking**
374+
- Missing error handling strategy
375+
- No consideration for scaling or performance
376+
- Ignoring edge cases and failure modes
377+
- Lack of observability (logging, metrics, tracing)
378+
- Missing or incomplete type definitions
379+
380+
### Workflow
381+
382+
1. **Start Claude Code** with permission bypass:
383+
\`\`\`bash
384+
SESSION=$(./scripts/tmux/tmux-cli.sh start --command "claude --dangerously-skip-permissions")
385+
\`\`\`
386+
387+
2. **Wait for CLI to initialize**, then capture:
388+
\`\`\`bash
389+
sleep 3
390+
./scripts/tmux/tmux-cli.sh capture "$SESSION" --label "initial-state"
391+
\`\`\`
392+
393+
3. **Send a detailed review prompt** (MUST start with "review"):
394+
\`\`\`bash
395+
./scripts/tmux/tmux-cli.sh send "$SESSION" "Review [files/directories from prompt]. Look for:
396+
397+
1. CODE ORGANIZATION: Poor structure, unclear separation of concerns, functions doing too much
398+
2. OVER-ENGINEERING: Unnecessary abstractions, premature optimization, complex patterns where simple would work
399+
3. AI SLOP: Verbose comments ('it\\'s important to note'), excessive disclaimers, inconsistent style, generic names, redundant explanations
400+
4. SYSTEMS THINKING: Missing error handling strategy, no scaling consideration, ignored edge cases, lack of observability
401+
402+
For each issue found, specify the file, line number, what\\'s wrong, and how to fix it. Be direct and specific."
403+
\`\`\`
404+
405+
4. **Wait for and capture the review output** (reviews take longer):
406+
\`\`\`bash
407+
./scripts/tmux/tmux-cli.sh capture "$SESSION" --label "review-output" --wait 60
408+
\`\`\`
409+
410+
If the review is still in progress, wait and capture again:
411+
\`\`\`bash
412+
./scripts/tmux/tmux-cli.sh capture "$SESSION" --label "review-output-continued" --wait 30
413+
\`\`\`
414+
415+
5. **Parse the review output** and populate \`reviewFindings\` with:
416+
- \`file\`: Path to the file with the issue
417+
- \`severity\`: "critical", "warning", "suggestion", or "info"
418+
- \`line\`: Line number if mentioned
419+
- \`finding\`: Description of the issue
420+
- \`suggestion\`: How to fix it
421+
422+
6. **Clean up**:
423+
\`\`\`bash
424+
./scripts/tmux/tmux-cli.sh stop "$SESSION"
425+
\`\`\`
426+
427+
---
428+
429+
## Output (Both Modes)
295430
296-
8. **If a helper script doesn't work correctly**, report it in \`scriptIssues\` with:
297-
- \`script\`: Which script failed (e.g., "tmux-send.sh")
298-
- \`issue\`: What went wrong
299-
- \`errorOutput\`: The actual error message
300-
- \`suggestedFix\`: How the parent agent should fix the script
431+
**Report results using set_output** - You MUST call set_output with structured results:
432+
- \`overallStatus\`: "success", "failure", or "partial"
433+
- \`summary\`: Brief description of what was tested/reviewed
434+
- \`testResults\`: Array of test outcomes (for test mode)
435+
- \`scriptIssues\`: Array of any problems with the helper scripts
436+
- \`captures\`: Array of capture paths with labels
437+
- \`reviewFindings\`: Array of code review findings (for review mode)
301438
302-
The parent agent CAN edit the scripts - you cannot. Your job is to identify issues clearly.
439+
**If a helper script doesn't work correctly**, report it in \`scriptIssues\` with:
440+
- \`script\`: Which script failed
441+
- \`issue\`: What went wrong
442+
- \`errorOutput\`: The actual error message
443+
- \`suggestedFix\`: How the parent agent should fix the script
303444
304-
9. **Always include captures** in your output so the parent agent can see what you saw.
445+
**Always include captures** in your output so the parent agent can see what you saw.
305446
306447
For advanced options, run \`./scripts/tmux/tmux-cli.sh help\` or check individual scripts with \`--help\`.`,
307448
}

0 commit comments

Comments
 (0)