Skip to content

Commit a5eff49

Browse files
committed
ci: use claude /code-review skill to automate PR reviews
Wire up the claude-review workflow to invoke our own /code-review slash command via claude -p, posting the findings as a PR comment.
1 parent 37ae765 commit a5eff49

8 files changed

Lines changed: 1753 additions & 3 deletions

File tree

.claude/commands/code-review.md

Lines changed: 357 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,357 @@
1+
---
2+
description: "Multi-agent code review orchestrator with Trail of Bits security analysis"
3+
argument-hint: "<owner/repo#PR_NUMBER> or <PR_NUMBER> [--focus=security|performance|architecture|all] [--security-depth=standard|deep|full]"
4+
allowed-tools:
5+
- Task
6+
- Bash
7+
- Read
8+
- Write
9+
- Edit
10+
- MultiEdit
11+
- Grep
12+
- Glob
13+
- WebFetch
14+
- WebSearch
15+
- TodoWrite
16+
---
17+
18+
# Multi-Agent Code Review Orchestrator
19+
20+
You are a review orchestrator. Your job is to dispatch specialized agents in
21+
parallel, collect their findings, and compile a unified review report.
22+
23+
**Target**: $ARGUMENTS
24+
25+
## Phase 1: Context Gathering
26+
27+
Parse arguments and collect PR context before dispatching agents.
28+
29+
1. **Parse arguments** to extract:
30+
- Repository owner and name (if provided as `owner/repo#PR`)
31+
- PR number
32+
- `--focus` area: `security`, `performance`, `architecture`, or `all` (default: `all`)
33+
- `--security-depth`: `standard`, `deep`, or `full` (default: `deep`)
34+
35+
2. **Fetch PR metadata** using `gh pr view`:
36+
```bash
37+
gh pr view {pr_number} --json number,title,author,baseRefName,headRefName,files,additions,deletions,body
38+
```
39+
40+
3. **Get the diff** for agent consumption:
41+
```bash
42+
gh pr diff {pr_number}
43+
```
44+
45+
4. **Get changed file list**:
46+
```bash
47+
gh pr view {pr_number} --json files --jq '.files[].path'
48+
```
49+
50+
5. **Classify changed files** by scanning filenames and diff content for:
51+
- **Crypto/Auth**: files touching keys, signatures, hashing, authentication
52+
- **Consensus/Protocol**: files referencing BIPs, BOLTs, validation rules, chain logic
53+
- **API/Config**: files defining public interfaces, configuration schemas, RPC endpoints
54+
- **Value Transfer**: files handling amounts, fees, balances, UTXOs, HTLCs, channels
55+
56+
Store these classifications -- they determine which conditional agents to launch.
57+
58+
6. **Create output directory**:
59+
```bash
60+
mkdir -p .reviews
61+
```
62+
63+
## Phase 2: Parallel Agent Dispatch
64+
65+
Launch agents using the **Task tool**. All agents in a tier must be launched
66+
in a **single message** with multiple Task tool calls so they run in parallel.
67+
68+
### Tier 1: Always-Run Agents
69+
70+
These agents ALWAYS run (when `--security-depth` is `deep` or `full`).
71+
If `--security-depth=standard`, only launch Agent 1 (code-reviewer).
72+
73+
Launch all applicable Tier 1 agents in a **single message** (parallel dispatch):
74+
75+
**Agent 1: Code Quality Review** (`subagent_type: code-reviewer`)
76+
```
77+
Prompt: Review PR #{pr_number} in {owner}/{repo}.
78+
79+
Focus: {focus_area}
80+
81+
PR Title: {title}
82+
PR Description: {body}
83+
Base Branch: {base_branch}
84+
Changed Files: {file_list}
85+
86+
Perform your full review methodology (Phases 0 through 7) focusing on code
87+
quality, correctness, Go patterns, test quality, breaking changes, and
88+
maintainability. Do NOT spawn sub-agents for security analysis -- the
89+
orchestrator handles that separately.
90+
91+
Write your findings to the review file at:
92+
.reviews/{owner}_{repo}_PR_{pr_number}_review.md
93+
94+
Classify each finding with severity: Critical (C), High (H), Medium (M),
95+
Low (L), or Informational (I). Use format: {severity}-{number}
96+
(e.g., H-1, M-2, I-3).
97+
```
98+
99+
**Agent 2: Offensive Security Audit** (`subagent_type: security-auditor`)
100+
```
101+
Prompt: Perform a security audit of PR #{pr_number} in {owner}/{repo}.
102+
103+
PR diff:
104+
{diff_content}
105+
106+
Changed files: {file_list}
107+
108+
Focus on:
109+
- DoS vectors and resource exhaustion
110+
- Fund loss and value transfer bugs
111+
- Race conditions and concurrency issues
112+
- Panic conditions reachable from external input
113+
- Consensus implications (chain split, re-org safety)
114+
- P2P attack vectors (eclipse, Sybil, amplification)
115+
- Cryptographic misuse
116+
117+
Develop proof-of-concept exploits for any vulnerabilities found.
118+
Classify each finding: Critical (C), High (H), Medium (M), Low (L),
119+
Informational (I). Use format: {severity}-{number}.
120+
```
121+
122+
**Agent 3: Differential Security Review** (`subagent_type: general-purpose`)
123+
```
124+
Prompt: You are performing a Trail of Bits-style differential security
125+
review. Follow the methodology from the `differential-review` skill.
126+
127+
PR #{pr_number} in {owner}/{repo}.
128+
Base branch: {base_branch}
129+
130+
Changed files: {file_list}
131+
132+
Execute the differential-review workflow:
133+
1. Intake & Triage: Risk-classify each changed file
134+
2. Changed Code Analysis: Use git blame on removed/modified lines to
135+
understand history and detect regressions
136+
3. Test Coverage Analysis: Identify test gaps for modified code paths
137+
4. Blast Radius Analysis: Count transitive callers of changed functions
138+
to quantify impact
139+
5. Deep Context Analysis: Apply Five Whys to understand root cause of
140+
changes
141+
6. Adversarial Analysis: Model attacker scenarios for HIGH risk changes
142+
7. Report: Generate findings with severity classifications
143+
144+
Classify each finding: Critical (C), High (H), Medium (M), Low (L),
145+
Informational (I). Use format: {severity}-{number}.
146+
```
147+
148+
### Tier 2: Conditional Agents
149+
150+
Launch these based on Phase 1 file classifications. When `--security-depth=full`,
151+
launch ALL Tier 2 agents unconditionally. Otherwise, only launch agents whose
152+
trigger conditions are met.
153+
154+
Launch all applicable Tier 2 agents in a **single message** (parallel dispatch).
155+
156+
**Agent 4: Deep Function Analysis** (`subagent_type: audit-context-building:function-analyzer`)
157+
- **Trigger**: Changed files classified as Crypto/Auth, Consensus/Protocol,
158+
or Value Transfer; OR `--focus=security`; OR `--security-depth=full`
159+
```
160+
Prompt: Perform ultra-granular analysis of the critical functions modified
161+
in PR #{pr_number} in {owner}/{repo}.
162+
163+
Focus on these files (the highest-risk changed files):
164+
{critical_file_list}
165+
166+
Follow the audit-context-building methodology:
167+
- Line-by-line semantic analysis of each modified function
168+
- Apply First Principles, 5 Whys, and 5 Hows at micro scale
169+
- Map invariants, assumptions, and trust boundaries
170+
- Track cross-function data flows with full context propagation
171+
- Zero speculation: every claim must cite exact line numbers
172+
173+
For each function produce: Purpose, Inputs/Assumptions, Outputs/Effects,
174+
Block-by-Block Analysis, Cross-Function Dependencies, Risk Considerations.
175+
176+
Classify findings: Critical (C), High (H), Medium (M), Low (L),
177+
Informational (I).
178+
```
179+
180+
**Agent 5: Spec Compliance Check** (`subagent_type: spec-to-code-compliance:spec-compliance-checker`)
181+
- **Trigger**: Changed files classified as Consensus/Protocol (references
182+
BIPs, BOLTs, or protocol-level logic); OR `--focus=architecture`;
183+
OR `--security-depth=full`
184+
```
185+
Prompt: Verify specification-to-code compliance for PR #{pr_number}
186+
in {owner}/{repo}.
187+
188+
Changed files touching protocol code: {protocol_file_list}
189+
190+
Follow the spec-to-code-compliance methodology:
191+
1. Discover spec sources (BIPs, BOLTs, design docs in the repo)
192+
2. Extract spec intent into structured format
193+
3. Analyze code behavior line-by-line
194+
4. Map spec items to code with match types:
195+
full_match, partial_match, mismatch, missing_in_code,
196+
code_stronger_than_spec, code_weaker_than_spec
197+
5. Classify divergences by severity
198+
199+
Anti-hallucination: if spec is silent, classify as UNDOCUMENTED.
200+
If code adds behavior, classify as UNDOCUMENTED CODE PATH.
201+
```
202+
203+
**Agent 6: API Safety & Insecure Defaults** (`subagent_type: general-purpose`)
204+
- **Trigger**: Changed files classified as API/Config (introduces or modifies
205+
public interfaces, config schemas, RPC endpoints); OR `--security-depth=full`
206+
```
207+
Prompt: Analyze the API surfaces and configuration defaults in
208+
PR #{pr_number} in {owner}/{repo}.
209+
210+
Changed API/config files: {api_file_list}
211+
212+
Perform two analyses:
213+
214+
1. SHARP EDGES (from the sharp-edges skill):
215+
Model three adversaries against the changed APIs:
216+
- Scoundrel: Malicious developer trying to exploit the API
217+
- Lazy Developer: Copy-pasting examples without reading docs
218+
- Confused Developer: Swapping parameters or misunderstanding semantics
219+
Check for: Algorithm Selection issues, Dangerous Defaults, Primitive vs
220+
Semantic API confusion, Configuration Cliffs, Silent Failures, and
221+
Stringly-Typed Security patterns.
222+
223+
2. INSECURE DEFAULTS (from the insecure-defaults skill):
224+
Scan for: Hardcoded fallback secrets, default credentials, weak crypto
225+
defaults, permissive access control (CORS *, public by default), debug
226+
features left enabled, fail-open vs fail-secure behavior.
227+
228+
Classify each finding: Critical (C), High (H), Medium (M), Low (L),
229+
Informational (I).
230+
```
231+
232+
## Phase 3: Result Compilation
233+
234+
After ALL agents complete, read their outputs and compile results.
235+
236+
### 3a. Collect Findings
237+
For each agent, extract:
238+
- Agent name and role
239+
- Finding count by severity
240+
- Individual findings with: ID, severity, title, description, file:line, fix
241+
242+
### 3b. Deduplicate
243+
When multiple agents flag the same issue:
244+
- Keep the finding with the most detail (PoC exploit > description-only)
245+
- Note which agents agree (e.g., "Confirmed by: security-auditor, differential-review")
246+
- If agents disagree on severity, escalate to the higher severity and note both
247+
248+
### 3c. Cross-Reference
249+
Merge complementary findings into stronger combined findings:
250+
- security-auditor PoC exploit + differential-review blast radius = stronger finding
251+
- code-reviewer pattern violation + sharp-edges footgun analysis = richer context
252+
- function-analyzer invariant violation + spec-compliance divergence = spec bug
253+
254+
## Phase 4: Unified Report Generation
255+
256+
Write the final report to `.reviews/{owner}_{repo}_PR_{pr_number}_review.md`.
257+
258+
### Report Structure:
259+
260+
```markdown
261+
# Code Review: {owner}/{repo} PR #{pr_number}
262+
263+
**Title**: {pr_title}
264+
**Author**: {author}
265+
**Date**: {date}
266+
**Base Branch**: {base_branch}
267+
**Files Changed**: {count}
268+
**Lines**: +{additions} -{deletions}
269+
**Security Depth**: {standard|deep|full}
270+
**Agents Deployed**: {count}
271+
272+
---
273+
274+
## Agent Summary
275+
276+
| # | Agent | Role | Findings |
277+
|---|-------|------|----------|
278+
| 1 | code-reviewer | Code quality & patterns | C-{n}, H-{n}, M-{n}, L-{n}, I-{n} |
279+
| 2 | security-auditor | Offensive security | C-{n}, H-{n}, M-{n}, L-{n}, I-{n} |
280+
| 3 | differential-review (ToB) | Diff security & blast radius | C-{n}, H-{n}, M-{n}, L-{n}, I-{n} |
281+
| 4 | function-analyzer (ToB) | Deep function analysis | ... (if run) |
282+
| 5 | spec-compliance (ToB) | BIP/BOLT compliance | ... (if run) |
283+
| 6 | sharp-edges + insecure-defaults (ToB) | API safety | ... (if run) |
284+
285+
---
286+
287+
## Critical Findings
288+
{All C-severity findings, with source agent(s) tagged}
289+
290+
## High Findings
291+
{All H-severity findings}
292+
293+
## Medium Findings
294+
{All M-severity findings}
295+
296+
## Low Findings
297+
{All L-severity findings}
298+
299+
## Informational
300+
{All I-severity findings}
301+
302+
---
303+
304+
## Specialized Analysis
305+
306+
### BIP/BOLT Compliance (if spec-compliance ran)
307+
{Compliance matrix and divergence findings}
308+
309+
### API Safety Report (if sharp-edges ran)
310+
{Footgun analysis and insecure default findings}
311+
312+
### Property-Based Testing Recommendations
313+
{Suggested property tests based on code patterns observed}
314+
315+
---
316+
317+
## Quality Scorecard
318+
319+
| Aspect | Score | Notes |
320+
|--------|-------|-------|
321+
| Correctness | /10 | |
322+
| Security | /10 | Combined: code-reviewer + security-auditor + ToB |
323+
| Performance | /10 | |
324+
| Testing | /10 | |
325+
| Maintainability | /10 | |
326+
| Documentation | /10 | |
327+
| Design | /10 | |
328+
329+
**Overall Grade**: {F|D|C|B|A}
330+
331+
---
332+
333+
## Executive Summary
334+
335+
### Verdict: {REJECT | MAJOR_REWORK_REQUIRED | MINOR_FIXES_NEEDED | APPROVED_WITH_CONDITIONS | APPROVED}
336+
337+
### Blockers ({count})
338+
{List of must-fix items before merge}
339+
340+
### Recommended Next Steps
341+
1. {Most critical action}
342+
2. {Second priority}
343+
3. {Third priority}
344+
```
345+
346+
## Important Notes
347+
348+
- Always launch Tier 1 agents in a SINGLE message with multiple Task tool
349+
calls so they execute in parallel.
350+
- If Tier 2 agents are triggered, launch them in a SECOND parallel batch
351+
after determining triggers from Phase 1.
352+
- Do NOT wait for Tier 1 to complete before launching Tier 2 -- both tiers
353+
can run simultaneously if trigger conditions are known from Phase 1.
354+
- The code-reviewer agent handles its own review file writing. Read its
355+
output after it completes and incorporate into the unified report.
356+
- When `--security-depth=standard`, skip all security agents and just run
357+
the code-reviewer alone. This is the fast path for low-risk PRs.

0 commit comments

Comments
 (0)