|
| 1 | +--- |
| 2 | +description: "Multi-agent code review orchestrator with Trail of Bits security analysis" |
| 3 | +argument-hint: "<owner/repo#PR_NUMBER> or <PR_NUMBER> [--focus=security|performance|architecture|all] [--security-depth=standard|deep|full]" |
| 4 | +allowed-tools: |
| 5 | + - Task |
| 6 | + - Bash |
| 7 | + - Read |
| 8 | + - Write |
| 9 | + - Edit |
| 10 | + - MultiEdit |
| 11 | + - Grep |
| 12 | + - Glob |
| 13 | + - WebFetch |
| 14 | + - WebSearch |
| 15 | + - TodoWrite |
| 16 | +--- |
| 17 | + |
| 18 | +# Multi-Agent Code Review Orchestrator |
| 19 | + |
| 20 | +You are a review orchestrator. Your job is to dispatch specialized agents in |
| 21 | +parallel, collect their findings, and compile a unified review report. |
| 22 | + |
| 23 | +**Target**: $ARGUMENTS |
| 24 | + |
| 25 | +## Phase 1: Context Gathering |
| 26 | + |
| 27 | +Parse arguments and collect PR context before dispatching agents. |
| 28 | + |
| 29 | +1. **Parse arguments** to extract: |
| 30 | + - Repository owner and name (if provided as `owner/repo#PR`) |
| 31 | + - PR number |
| 32 | + - `--focus` area: `security`, `performance`, `architecture`, or `all` (default: `all`) |
| 33 | + - `--security-depth`: `standard`, `deep`, or `full` (default: `deep`) |
| 34 | + |
| 35 | +2. **Fetch PR metadata** using `gh pr view`: |
| 36 | + ```bash |
| 37 | + gh pr view {pr_number} --json number,title,author,baseRefName,headRefName,files,additions,deletions,body |
| 38 | + ``` |
| 39 | + |
| 40 | +3. **Get the diff** for agent consumption: |
| 41 | + ```bash |
| 42 | + gh pr diff {pr_number} |
| 43 | + ``` |
| 44 | + |
| 45 | +4. **Get changed file list**: |
| 46 | + ```bash |
| 47 | + gh pr view {pr_number} --json files --jq '.files[].path' |
| 48 | + ``` |
| 49 | + |
| 50 | +5. **Classify changed files** by scanning filenames and diff content for: |
| 51 | + - **Crypto/Auth**: files touching keys, signatures, hashing, authentication |
| 52 | + - **Consensus/Protocol**: files referencing BIPs, BOLTs, validation rules, chain logic |
| 53 | + - **API/Config**: files defining public interfaces, configuration schemas, RPC endpoints |
| 54 | + - **Value Transfer**: files handling amounts, fees, balances, UTXOs, HTLCs, channels |
| 55 | + |
| 56 | + Store these classifications -- they determine which conditional agents to launch. |
| 57 | + |
| 58 | +6. **Create output directory**: |
| 59 | + ```bash |
| 60 | + mkdir -p .reviews |
| 61 | + ``` |
| 62 | + |
| 63 | +## Phase 2: Parallel Agent Dispatch |
| 64 | + |
| 65 | +Launch agents using the **Task tool**. All agents in a tier must be launched |
| 66 | +in a **single message** with multiple Task tool calls so they run in parallel. |
| 67 | + |
| 68 | +### Tier 1: Always-Run Agents |
| 69 | + |
| 70 | +These agents ALWAYS run (when `--security-depth` is `deep` or `full`). |
| 71 | +If `--security-depth=standard`, only launch Agent 1 (code-reviewer). |
| 72 | + |
| 73 | +Launch all applicable Tier 1 agents in a **single message** (parallel dispatch): |
| 74 | + |
| 75 | +**Agent 1: Code Quality Review** (`subagent_type: code-reviewer`) |
| 76 | +``` |
| 77 | +Prompt: Review PR #{pr_number} in {owner}/{repo}. |
| 78 | +
|
| 79 | +Focus: {focus_area} |
| 80 | +
|
| 81 | +PR Title: {title} |
| 82 | +PR Description: {body} |
| 83 | +Base Branch: {base_branch} |
| 84 | +Changed Files: {file_list} |
| 85 | +
|
| 86 | +Perform your full review methodology (Phases 0 through 7) focusing on code |
| 87 | +quality, correctness, Go patterns, test quality, breaking changes, and |
| 88 | +maintainability. Do NOT spawn sub-agents for security analysis -- the |
| 89 | +orchestrator handles that separately. |
| 90 | +
|
| 91 | +Write your findings to the review file at: |
| 92 | +.reviews/{owner}_{repo}_PR_{pr_number}_review.md |
| 93 | +
|
| 94 | +Classify each finding with severity: Critical (C), High (H), Medium (M), |
| 95 | +Low (L), or Informational (I). Use format: {severity}-{number} |
| 96 | +(e.g., H-1, M-2, I-3). |
| 97 | +``` |
| 98 | + |
| 99 | +**Agent 2: Offensive Security Audit** (`subagent_type: security-auditor`) |
| 100 | +``` |
| 101 | +Prompt: Perform a security audit of PR #{pr_number} in {owner}/{repo}. |
| 102 | +
|
| 103 | +PR diff: |
| 104 | +{diff_content} |
| 105 | +
|
| 106 | +Changed files: {file_list} |
| 107 | +
|
| 108 | +Focus on: |
| 109 | +- DoS vectors and resource exhaustion |
| 110 | +- Fund loss and value transfer bugs |
| 111 | +- Race conditions and concurrency issues |
| 112 | +- Panic conditions reachable from external input |
| 113 | +- Consensus implications (chain split, re-org safety) |
| 114 | +- P2P attack vectors (eclipse, Sybil, amplification) |
| 115 | +- Cryptographic misuse |
| 116 | +
|
| 117 | +Develop proof-of-concept exploits for any vulnerabilities found. |
| 118 | +Classify each finding: Critical (C), High (H), Medium (M), Low (L), |
| 119 | +Informational (I). Use format: {severity}-{number}. |
| 120 | +``` |
| 121 | + |
| 122 | +**Agent 3: Differential Security Review** (`subagent_type: general-purpose`) |
| 123 | +``` |
| 124 | +Prompt: You are performing a Trail of Bits-style differential security |
| 125 | +review. Follow the methodology from the `differential-review` skill. |
| 126 | +
|
| 127 | +PR #{pr_number} in {owner}/{repo}. |
| 128 | +Base branch: {base_branch} |
| 129 | +
|
| 130 | +Changed files: {file_list} |
| 131 | +
|
| 132 | +Execute the differential-review workflow: |
| 133 | +1. Intake & Triage: Risk-classify each changed file |
| 134 | +2. Changed Code Analysis: Use git blame on removed/modified lines to |
| 135 | + understand history and detect regressions |
| 136 | +3. Test Coverage Analysis: Identify test gaps for modified code paths |
| 137 | +4. Blast Radius Analysis: Count transitive callers of changed functions |
| 138 | + to quantify impact |
| 139 | +5. Deep Context Analysis: Apply Five Whys to understand root cause of |
| 140 | + changes |
| 141 | +6. Adversarial Analysis: Model attacker scenarios for HIGH risk changes |
| 142 | +7. Report: Generate findings with severity classifications |
| 143 | +
|
| 144 | +Classify each finding: Critical (C), High (H), Medium (M), Low (L), |
| 145 | +Informational (I). Use format: {severity}-{number}. |
| 146 | +``` |
| 147 | + |
| 148 | +### Tier 2: Conditional Agents |
| 149 | + |
| 150 | +Launch these based on Phase 1 file classifications. When `--security-depth=full`, |
| 151 | +launch ALL Tier 2 agents unconditionally. Otherwise, only launch agents whose |
| 152 | +trigger conditions are met. |
| 153 | + |
| 154 | +Launch all applicable Tier 2 agents in a **single message** (parallel dispatch). |
| 155 | + |
| 156 | +**Agent 4: Deep Function Analysis** (`subagent_type: audit-context-building:function-analyzer`) |
| 157 | +- **Trigger**: Changed files classified as Crypto/Auth, Consensus/Protocol, |
| 158 | + or Value Transfer; OR `--focus=security`; OR `--security-depth=full` |
| 159 | +``` |
| 160 | +Prompt: Perform ultra-granular analysis of the critical functions modified |
| 161 | +in PR #{pr_number} in {owner}/{repo}. |
| 162 | +
|
| 163 | +Focus on these files (the highest-risk changed files): |
| 164 | +{critical_file_list} |
| 165 | +
|
| 166 | +Follow the audit-context-building methodology: |
| 167 | +- Line-by-line semantic analysis of each modified function |
| 168 | +- Apply First Principles, 5 Whys, and 5 Hows at micro scale |
| 169 | +- Map invariants, assumptions, and trust boundaries |
| 170 | +- Track cross-function data flows with full context propagation |
| 171 | +- Zero speculation: every claim must cite exact line numbers |
| 172 | +
|
| 173 | +For each function produce: Purpose, Inputs/Assumptions, Outputs/Effects, |
| 174 | +Block-by-Block Analysis, Cross-Function Dependencies, Risk Considerations. |
| 175 | +
|
| 176 | +Classify findings: Critical (C), High (H), Medium (M), Low (L), |
| 177 | +Informational (I). |
| 178 | +``` |
| 179 | + |
| 180 | +**Agent 5: Spec Compliance Check** (`subagent_type: spec-to-code-compliance:spec-compliance-checker`) |
| 181 | +- **Trigger**: Changed files classified as Consensus/Protocol (references |
| 182 | + BIPs, BOLTs, or protocol-level logic); OR `--focus=architecture`; |
| 183 | + OR `--security-depth=full` |
| 184 | +``` |
| 185 | +Prompt: Verify specification-to-code compliance for PR #{pr_number} |
| 186 | +in {owner}/{repo}. |
| 187 | +
|
| 188 | +Changed files touching protocol code: {protocol_file_list} |
| 189 | +
|
| 190 | +Follow the spec-to-code-compliance methodology: |
| 191 | +1. Discover spec sources (BIPs, BOLTs, design docs in the repo) |
| 192 | +2. Extract spec intent into structured format |
| 193 | +3. Analyze code behavior line-by-line |
| 194 | +4. Map spec items to code with match types: |
| 195 | + full_match, partial_match, mismatch, missing_in_code, |
| 196 | + code_stronger_than_spec, code_weaker_than_spec |
| 197 | +5. Classify divergences by severity |
| 198 | +
|
| 199 | +Anti-hallucination: if spec is silent, classify as UNDOCUMENTED. |
| 200 | +If code adds behavior, classify as UNDOCUMENTED CODE PATH. |
| 201 | +``` |
| 202 | + |
| 203 | +**Agent 6: API Safety & Insecure Defaults** (`subagent_type: general-purpose`) |
| 204 | +- **Trigger**: Changed files classified as API/Config (introduces or modifies |
| 205 | + public interfaces, config schemas, RPC endpoints); OR `--security-depth=full` |
| 206 | +``` |
| 207 | +Prompt: Analyze the API surfaces and configuration defaults in |
| 208 | +PR #{pr_number} in {owner}/{repo}. |
| 209 | +
|
| 210 | +Changed API/config files: {api_file_list} |
| 211 | +
|
| 212 | +Perform two analyses: |
| 213 | +
|
| 214 | +1. SHARP EDGES (from the sharp-edges skill): |
| 215 | + Model three adversaries against the changed APIs: |
| 216 | + - Scoundrel: Malicious developer trying to exploit the API |
| 217 | + - Lazy Developer: Copy-pasting examples without reading docs |
| 218 | + - Confused Developer: Swapping parameters or misunderstanding semantics |
| 219 | + Check for: Algorithm Selection issues, Dangerous Defaults, Primitive vs |
| 220 | + Semantic API confusion, Configuration Cliffs, Silent Failures, and |
| 221 | + Stringly-Typed Security patterns. |
| 222 | +
|
| 223 | +2. INSECURE DEFAULTS (from the insecure-defaults skill): |
| 224 | + Scan for: Hardcoded fallback secrets, default credentials, weak crypto |
| 225 | + defaults, permissive access control (CORS *, public by default), debug |
| 226 | + features left enabled, fail-open vs fail-secure behavior. |
| 227 | +
|
| 228 | +Classify each finding: Critical (C), High (H), Medium (M), Low (L), |
| 229 | +Informational (I). |
| 230 | +``` |
| 231 | + |
| 232 | +## Phase 3: Result Compilation |
| 233 | + |
| 234 | +After ALL agents complete, read their outputs and compile results. |
| 235 | + |
| 236 | +### 3a. Collect Findings |
| 237 | +For each agent, extract: |
| 238 | +- Agent name and role |
| 239 | +- Finding count by severity |
| 240 | +- Individual findings with: ID, severity, title, description, file:line, fix |
| 241 | + |
| 242 | +### 3b. Deduplicate |
| 243 | +When multiple agents flag the same issue: |
| 244 | +- Keep the finding with the most detail (PoC exploit > description-only) |
| 245 | +- Note which agents agree (e.g., "Confirmed by: security-auditor, differential-review") |
| 246 | +- If agents disagree on severity, escalate to the higher severity and note both |
| 247 | + |
| 248 | +### 3c. Cross-Reference |
| 249 | +Merge complementary findings into stronger combined findings: |
| 250 | +- security-auditor PoC exploit + differential-review blast radius = stronger finding |
| 251 | +- code-reviewer pattern violation + sharp-edges footgun analysis = richer context |
| 252 | +- function-analyzer invariant violation + spec-compliance divergence = spec bug |
| 253 | + |
| 254 | +## Phase 4: Unified Report Generation |
| 255 | + |
| 256 | +Write the final report to `.reviews/{owner}_{repo}_PR_{pr_number}_review.md`. |
| 257 | + |
| 258 | +### Report Structure: |
| 259 | + |
| 260 | +```markdown |
| 261 | +# Code Review: {owner}/{repo} PR #{pr_number} |
| 262 | + |
| 263 | +**Title**: {pr_title} |
| 264 | +**Author**: {author} |
| 265 | +**Date**: {date} |
| 266 | +**Base Branch**: {base_branch} |
| 267 | +**Files Changed**: {count} |
| 268 | +**Lines**: +{additions} -{deletions} |
| 269 | +**Security Depth**: {standard|deep|full} |
| 270 | +**Agents Deployed**: {count} |
| 271 | + |
| 272 | +--- |
| 273 | + |
| 274 | +## Agent Summary |
| 275 | + |
| 276 | +| # | Agent | Role | Findings | |
| 277 | +|---|-------|------|----------| |
| 278 | +| 1 | code-reviewer | Code quality & patterns | C-{n}, H-{n}, M-{n}, L-{n}, I-{n} | |
| 279 | +| 2 | security-auditor | Offensive security | C-{n}, H-{n}, M-{n}, L-{n}, I-{n} | |
| 280 | +| 3 | differential-review (ToB) | Diff security & blast radius | C-{n}, H-{n}, M-{n}, L-{n}, I-{n} | |
| 281 | +| 4 | function-analyzer (ToB) | Deep function analysis | ... (if run) | |
| 282 | +| 5 | spec-compliance (ToB) | BIP/BOLT compliance | ... (if run) | |
| 283 | +| 6 | sharp-edges + insecure-defaults (ToB) | API safety | ... (if run) | |
| 284 | + |
| 285 | +--- |
| 286 | + |
| 287 | +## Critical Findings |
| 288 | +{All C-severity findings, with source agent(s) tagged} |
| 289 | + |
| 290 | +## High Findings |
| 291 | +{All H-severity findings} |
| 292 | + |
| 293 | +## Medium Findings |
| 294 | +{All M-severity findings} |
| 295 | + |
| 296 | +## Low Findings |
| 297 | +{All L-severity findings} |
| 298 | + |
| 299 | +## Informational |
| 300 | +{All I-severity findings} |
| 301 | + |
| 302 | +--- |
| 303 | + |
| 304 | +## Specialized Analysis |
| 305 | + |
| 306 | +### BIP/BOLT Compliance (if spec-compliance ran) |
| 307 | +{Compliance matrix and divergence findings} |
| 308 | + |
| 309 | +### API Safety Report (if sharp-edges ran) |
| 310 | +{Footgun analysis and insecure default findings} |
| 311 | + |
| 312 | +### Property-Based Testing Recommendations |
| 313 | +{Suggested property tests based on code patterns observed} |
| 314 | + |
| 315 | +--- |
| 316 | + |
| 317 | +## Quality Scorecard |
| 318 | + |
| 319 | +| Aspect | Score | Notes | |
| 320 | +|--------|-------|-------| |
| 321 | +| Correctness | /10 | | |
| 322 | +| Security | /10 | Combined: code-reviewer + security-auditor + ToB | |
| 323 | +| Performance | /10 | | |
| 324 | +| Testing | /10 | | |
| 325 | +| Maintainability | /10 | | |
| 326 | +| Documentation | /10 | | |
| 327 | +| Design | /10 | | |
| 328 | + |
| 329 | +**Overall Grade**: {F|D|C|B|A} |
| 330 | + |
| 331 | +--- |
| 332 | + |
| 333 | +## Executive Summary |
| 334 | + |
| 335 | +### Verdict: {REJECT | MAJOR_REWORK_REQUIRED | MINOR_FIXES_NEEDED | APPROVED_WITH_CONDITIONS | APPROVED} |
| 336 | + |
| 337 | +### Blockers ({count}) |
| 338 | +{List of must-fix items before merge} |
| 339 | + |
| 340 | +### Recommended Next Steps |
| 341 | +1. {Most critical action} |
| 342 | +2. {Second priority} |
| 343 | +3. {Third priority} |
| 344 | +``` |
| 345 | + |
| 346 | +## Important Notes |
| 347 | + |
| 348 | +- Always launch Tier 1 agents in a SINGLE message with multiple Task tool |
| 349 | + calls so they execute in parallel. |
| 350 | +- If Tier 2 agents are triggered, launch them in a SECOND parallel batch |
| 351 | + after determining triggers from Phase 1. |
| 352 | +- Do NOT wait for Tier 1 to complete before launching Tier 2 -- both tiers |
| 353 | + can run simultaneously if trigger conditions are known from Phase 1. |
| 354 | +- The code-reviewer agent handles its own review file writing. Read its |
| 355 | + output after it completes and incorporate into the unified report. |
| 356 | +- When `--security-depth=standard`, skip all security agents and just run |
| 357 | + the code-reviewer alone. This is the fast path for low-risk PRs. |
0 commit comments