diff --git a/docs/plans/2026-03-26-001-feat-adversarial-review-agents-plan.md b/docs/plans/2026-03-26-001-feat-adversarial-review-agents-plan.md new file mode 100644 index 000000000..78c76f357 --- /dev/null +++ b/docs/plans/2026-03-26-001-feat-adversarial-review-agents-plan.md @@ -0,0 +1,330 @@ +--- +title: "feat: Add adversarial review agents for code and documents" +type: feat +status: completed +date: 2026-03-26 +deepened: 2026-03-26 +--- + +# feat: Add adversarial review agents for code and documents + +## Overview + +Add two adversarial review agents to the compound-engineering plugin — one for code review and one for document review. These agents take a fundamentally different stance from existing reviewers: instead of evaluating quality against known criteria, they actively try to *falsify* the artifact by constructing scenarios that break it, challenging assumptions, and probing for problems that pattern-matching reviewers miss. + +Both agents integrate into the existing review ensembles as conditional reviewers, activated by skill-level filtering. Both auto-scale their depth internally based on artifact size and risk signals. Both produce findings using the standard JSON contract so they merge cleanly into existing synthesis pipelines. + +## Problem Frame + +The existing review infrastructure is comprehensive — 24 code review agents and 6 document review agents covering correctness, security, reliability, maintainability, performance, scope, feasibility, and coherence. But all reviewers share an *evaluative* stance: they check artifacts against known quality criteria. + +What's missing is a *falsification* stance — actively constructing scenarios that break the artifact, challenging the assumptions behind decisions, and probing for emergent failures that no single-pattern reviewer would catch. This is the gap that gstack's adversarial evaluation fills (cross-model challenge mode, spec review loops, proxy skepticism, shadow path tracing) and that compound-engineering currently lacks. + +## Requirements Trace + +- R1. Code adversarial-reviewer agent that tries to break implementations by constructing failure scenarios +- R2. Document adversarial-reviewer agent that challenges premises, assumptions, and decisions in plans/requirements +- R3. Both agents use the standard JSON findings contract for their respective pipelines +- R4. Skill-level filtering: orchestrating skills decide whether to dispatch adversarial review +- R5. Agent-level auto-scaling: agents modulate their own depth (quick/standard/deep) based on artifact size and risk +- R6. Direct invocation: agents work when called directly, not only through skill pipelines +- R7. Clear boundaries: each agent has explicit "do not flag" rules to prevent overlap with existing reviewers + +## Scope Boundaries + +- No cross-model adversarial review (no Codex/external model integration) — that's a separate feature +- No changes to findings schemas — both agents use existing schemas as-is +- No new skills — agents integrate into existing `ce-review` and `document-review` skills +- No changes to synthesis/dedup pipelines — agents produce standard output that existing pipelines handle +- No beta framework — these are additive conditional reviewers with no risk to existing behavior + +## Context & Research + +### Relevant Code and Patterns + +- `plugins/compound-engineering/agents/review/*.md` — 24 existing code review agents following consistent structure (identity, hunting list, confidence calibration, suppress conditions, output format) +- `plugins/compound-engineering/agents/document-review/*.md` — 6 existing document review agents (identity, analysis focus, confidence calibration, suppress conditions) +- `plugins/compound-engineering/skills/ce-review/SKILL.md` — code review orchestration with tiered persona ensemble +- `plugins/compound-engineering/skills/ce-review/references/persona-catalog.md` — reviewer registry with always-on, cross-cutting conditional, and stack-specific conditional tiers +- `plugins/compound-engineering/skills/document-review/SKILL.md` — document review orchestration with 2 always-on + 4 conditional personas +- `plugins/compound-engineering/skills/ce-review/references/findings-schema.json` — code review findings contract +- `plugins/compound-engineering/skills/document-review/references/findings-schema.json` — document review findings contract + +### Institutional Learnings + +- Reviewer selection is agent judgment, not keyword matching — the orchestrator reads the diff and reasons about which conditionals to activate +- Per-persona confidence calibration and explicit suppress conditions are the primary noise-control mechanism +- Intent shapes review depth (how hard each reviewer looks), not reviewer selection +- Conservative routing on disagreement: merged findings narrow but never widen without evidence +- Subagent template pattern wraps persona + schema + context for consistent dispatch + +### External References + +- gstack adversarial patterns analyzed: `/codex` challenge mode (chaos engineer prompting), `/plan-ceo-review` (proxy skepticism, independent spec review loop), `/plan-design-review` (auto-scaling by diff size), `/plan-eng-review` (error & rescue map, shadow path tracing), `/cso` (20 hard exclusion rules + 22 precedents) + +## Key Technical Decisions + +- **Two agents, not one**: Document and code adversarial review require fundamentally different reasoning techniques (strategic skepticism vs. chaos engineering). A single agent would need such a sprawling prompt that it loses sharpness at both. +- **Conditional tier, not always-on**: Adversarial review is expensive. Small config changes and trivial fixes don't need it. Skill-level filtering gates dispatch; agent-level auto-scaling gates depth. +- **Same short persona name in both pipelines**: Both agents use `"reviewer": "adversarial"` in their JSON output. This is safe because the two pipelines (ce-review and document-review) never merge findings across each other. +- **Depth determined by artifact size + risk signals**: The agent reads the artifact and determines quick/standard/deep. Callers can override depth via the intent summary (e.g., "this is a critical auth change, review deeply"). +- **Agent-internal auto-scaling, not template-driven**: No existing review agent auto-scales depth — this is a novel pattern in the plugin. The subagent templates pass the full raw diff/document but no sizing metadata (no line count, word count, or risk classification). Rather than extending the shared templates with new variables (which would affect all reviewers), each adversarial agent estimates size from the raw content it already receives. The code agent counts diff hunk lines; the document agent estimates word/requirement count from the text. This keeps the change additive — no template modifications, no orchestrator changes. +- **Auto-scaling thresholds grounded in gstack precedent**: The 50-line code threshold matches gstack's `plan-design-review` small-diff cutoff where adversarial review is skipped entirely. The 200-line threshold matches where gstack escalates to full multi-pass adversarial. Document thresholds (1000/3000 words) are set proportionally — a 1000-word doc is roughly a lightweight plan, a 3000-word doc is a Standard/Deep plan. These are starting values to tune based on usage. +- **No overlap with existing reviewers by design**: Each agent's "What you don't flag" section explicitly defers to existing specialists. The adversarial agent finds problems that emerge from the *combination* or *assumptions* of the system, not problems in individual patterns. + +## Open Questions + +### Resolved During Planning + +- **Should the agents share a name?** Yes — both are `adversarial-reviewer` in their respective directories. The fully-qualified names (`compound-engineering:review:adversarial-reviewer` and `compound-engineering:document-review:adversarial-reviewer`) are distinct. The persona catalog uses FQ names. +- **What model should they use?** `model: inherit` for both, matching all other review agents. Adversarial review benefits from the strongest available model. +- **What confidence thresholds?** Code adversarial: 0.60 floor (matching ce-review pipeline). Document adversarial: 0.50 floor (matching document-review pipeline). High confidence (0.80+) requires a concrete constructed scenario with traceable evidence. + +### Deferred to Implementation + +- Exact wording of system prompt scenarios and examples — these will be refined during agent authoring based on what reads clearly +- Whether the depth auto-scaling thresholds (50/200 lines for code, 1000/3000 words for docs) need tuning — start with these and adjust based on usage + +--- + +## Implementation Units + +- [x] **Unit 1: Create code adversarial-reviewer agent** + + **Goal:** Define the adversarial reviewer for code diffs that tries to break implementations by constructing failure scenarios + + **Requirements:** R1, R3, R5, R6, R7 + + **Dependencies:** None + + **Files:** + - Create: `plugins/compound-engineering/agents/review/adversarial-reviewer.md` + + **Approach:** + Follow the standard code review agent structure (identity, hunting list, confidence calibration, suppress conditions, output format). The key differentiation is in the *hunting list* — these are not patterns to match but *scenario construction techniques*: + + 1. **Assumption violation** — identify assumptions the code makes about its environment (API always returns JSON, config always set, queue never empty, input always within range) and construct scenarios where those assumptions break. Different from correctness-reviewer which checks logic *given* assumptions. + 2. **Composition failures** — trace interactions across component boundaries where each component is correct in isolation but the combination fails (ordering assumptions, shared state mutations, contract mismatches between caller and callee). Different from correctness-reviewer which examines individual code paths. + 3. **Cascade construction** — build multi-step failure chains: "A times out, causing B to retry, overwhelming C." Different from reliability-reviewer which checks individual failure handling. + 4. **Abuse cases** — find legitimate-seeming usage patterns that cause bad outcomes: "user submits this 1000 times," "request arrives during deployment," "two users edit the same resource simultaneously." Not security exploits (security-reviewer) and not performance anti-patterns (performance-reviewer) — emergent misbehavior. + + Auto-scaling logic in the system prompt. The agent receives the full raw diff via the subagent template's `{diff}` variable and the intent summary via `{intent_summary}`. No sizing metadata is pre-computed — the agent estimates diff size from the content it receives and extracts risk signals from the free-text intent summary (e.g., "Simplify tax calculation" = low risk; "Add OAuth2 flow for payment provider" = high risk). + + - **Quick** (<50 changed lines): assumption violation scan only — identify 2-3 assumptions the code makes and whether they could be violated + - **Standard** (50-199 lines): + scenario construction + abuse cases + - **Deep** (200+ lines OR risk signals like auth/payments/data mutations): + composition failures + cascade construction + multi-pass + + Suppress conditions (what NOT to flag): + - Individual logic bugs without cross-component impact (correctness-reviewer) + - Known vulnerability patterns like SQL injection, XSS (security-reviewer) + - Individual missing error handling (reliability-reviewer) + - Performance anti-patterns like N+1 queries (performance-reviewer) + - Code style, naming, structure issues (maintainability-reviewer) + - Test coverage gaps (testing-reviewer) + - API contract changes (api-contract-reviewer) + + **Patterns to follow:** + - `plugins/compound-engineering/agents/review/correctness-reviewer.md` — closest structural analog + - `plugins/compound-engineering/agents/review/reliability-reviewer.md` — for cascade/failure-chain framing + + **Test scenarios:** + - Agent file parses with valid YAML frontmatter (name, description, model, tools, color fields present) + - System prompt contains all 4 hunting techniques with concrete descriptions + - Confidence calibration has 3 tiers matching ce-review thresholds (0.80+, 0.60-0.79, below 0.60) + - Suppress conditions explicitly name every existing reviewer whose territory is deferred + - Output format section matches standard JSON skeleton with `"reviewer": "adversarial"` + - Auto-scaling thresholds are documented in the system prompt + + **Verification:** + - `bun run release:validate` passes + - Agent file follows the exact section ordering of existing review agents + +--- + +- [x] **Unit 2: Create document adversarial-reviewer agent** + + **Goal:** Define the adversarial reviewer for planning/requirements documents that challenges premises, assumptions, and decisions + + **Requirements:** R2, R3, R5, R6, R7 + + **Dependencies:** None + + **Files:** + - Create: `plugins/compound-engineering/agents/document-review/adversarial-reviewer.md` + + **Approach:** + Follow the standard document review agent structure (identity, analysis focus, confidence calibration, suppress conditions). The analysis techniques: + + 1. **Premise challenging** — question whether the stated problem is the real problem. "The document says X is the goal — but the requirements described actually solve Y. Which is it?" Different from coherence-reviewer which checks internal consistency without questioning whether the goals themselves are right. + 2. **Assumption surfacing** — force unstated assumptions into the open. "This plan assumes Z will always be true. Where is that stated? What happens if it's not?" Different from feasibility-reviewer which checks whether the approach works given its assumptions. + 3. **Decision stress-testing** — for each major technical or scope decision: "What would make this the wrong choice? What evidence would falsify this decision?" Different from scope-guardian which checks alignment between stated scope and stated goals, not whether the goals themselves are well-chosen. + 4. **Simplification pressure** — "What's the simplest version that would validate this? Does this abstraction earn its keep? What could be removed without losing the core value?" Different from scope-guardian which checks for scope creep, not for over-engineering within scope. + 5. **Alternative blindness** — "What approaches were not considered? Why was this path chosen over the obvious alternatives?" Different from feasibility-reviewer which evaluates the proposed approach, not what was left on the table. + + Auto-scaling logic. The agent receives the full document text via the subagent template's `{document_content}` variable and the document type ("requirements" or "plan") via `{document_type}`. No word count or requirement count is pre-computed — the agent estimates from the content. Risk signals come from the document content itself (domain keywords, abstraction proposals, scope size). + + - **Quick** (small doc, <1000 words or <5 requirements): premise check + simplification pressure only + - **Standard** (medium doc): + assumption surfacing + decision stress-testing + - **Deep** (large doc, >3000 words or >10 requirements, or high-stakes domain like auth/payments/migrations): + alternative blindness + multi-pass + + Suppress conditions: + - Internal contradictions or terminology drift (coherence-reviewer) + - Technical feasibility or architecture conflicts (feasibility-reviewer) + - Scope-goal alignment or priority dependency issues (scope-guardian-reviewer) + - UI/UX quality or user flow completeness (design-lens-reviewer) + - Security implications at plan level (security-lens-reviewer) + - Product framing or business justification (product-lens-reviewer) + + **Patterns to follow:** + - `plugins/compound-engineering/agents/document-review/scope-guardian-reviewer.md` — closest structural analog (also challenges scope decisions) + - `plugins/compound-engineering/agents/document-review/feasibility-reviewer.md` — for assumption-adjacent framing + + **Test scenarios:** + - Agent file parses with valid YAML frontmatter (name, description, model fields present) + - System prompt contains all 5 analysis techniques with concrete descriptions + - Confidence calibration has 3 tiers matching document-review thresholds (0.80+, 0.60-0.79, below 0.50) + - Suppress conditions explicitly name every existing document reviewer whose territory is deferred + - Auto-scaling thresholds are documented in the system prompt + - No output format section (document review agents get output contract from subagent template) + + **Verification:** + - `bun run release:validate` passes + - Agent file follows the structural conventions of existing document review agents + +--- + +- [x] **Unit 3: Integrate code adversarial-reviewer into ce-review skill** + + **Goal:** Register the adversarial-reviewer as a cross-cutting conditional in the ce-review persona catalog and add selection logic to the skill + + **Requirements:** R4, R5 + + **Dependencies:** Unit 1 + + **Files:** + - Modify: `plugins/compound-engineering/skills/ce-review/references/persona-catalog.md` + - Modify: `plugins/compound-engineering/skills/ce-review/SKILL.md` + + **Approach:** + + *Persona catalog:* + Add `adversarial` to the cross-cutting conditional tier table: + ``` + | `adversarial` | `compound-engineering:review:adversarial-reviewer` | Select when diff is >=50 changed lines, OR touches auth, payments, data mutations, external API integrations, or other high-risk domains | + ``` + + *Skill selection logic (Stage 3):* + Add adversarial-reviewer to the conditional selection with these activation rules: + - Diff size >= 50 changed lines (excluding test files, generated files, lockfiles) + - OR diff touches high-risk domains: authentication/authorization, payment processing, data mutations/migrations, external API integrations, cryptography + - The intent summary is passed to the agent to inform auto-scaling depth (the agent decides quick/standard/deep, not the skill) + + *Announcement format:* + ``` + - adversarial -- 147 changed lines across auth controller and payment service + ``` + + **Patterns to follow:** + - How `security` is listed in the persona catalog cross-cutting conditional table + - How `reliability` selection logic is described in Stage 3 + + **Test scenarios:** + - Persona catalog has adversarial in the cross-cutting conditional table with correct FQ agent name + - Selection logic references both size threshold and risk domain triggers + - Announcement format matches existing conditional reviewer pattern (`name -- justification`) + + **Verification:** + - `bun run release:validate` passes + - Persona catalog table renders correctly in markdown preview + +--- + +- [x] **Unit 4: Integrate document adversarial-reviewer into document-review skill** + + **Goal:** Register the adversarial-reviewer as a conditional reviewer in the document-review skill with activation signals + + **Requirements:** R4, R5 + + **Dependencies:** Unit 2 + + **Files:** + - Modify: `plugins/compound-engineering/skills/document-review/SKILL.md` + + **Approach:** + + Add adversarial-reviewer to the conditional persona selection (Phase 1) with these activation signals: + - Document contains >5 distinct requirements or implementation units + - Document makes explicit architectural or scope decisions with stated rationale + - Document covers high-stakes domains (auth, payments, data migrations, external integrations) + - Document proposes new abstractions, frameworks, or significant architectural patterns + + Announcement format: + ``` + - adversarial-reviewer -- plan proposes new abstraction layer with 8 requirements across auth and payments + ``` + + **Patterns to follow:** + - How `scope-guardian-reviewer` activation signals are listed (bulleted under "activate when the document contains:") + - How `security-lens-reviewer` activation signals reference domain keywords + + **Test scenarios:** + - Activation signals listed in the same format as existing conditional reviewers + - Announcement format matches existing pattern + - Maximum reviewer count updated if the skill documents a cap (currently 6 max — now 7 possible) + + **Verification:** + - `bun run release:validate` passes + +--- + +- [x] **Unit 5: Update plugin metadata and documentation** + + **Goal:** Update agent counts and document the new adversarial reviewers in plugin README + + **Requirements:** None (housekeeping) + + **Dependencies:** Units 1-4 + + **Files:** + - Modify: `plugins/compound-engineering/README.md` (agent count, reviewer table if one exists) + - Modify: `.claude-plugin/marketplace.json` (if it tracks agent counts) + - Modify: `plugins/compound-engineering/.claude-plugin/plugin.json` (if it tracks agent counts) + + **Approach:** + - Update any agent count references (24 code review agents -> 25, 6 document review agents -> 7) + - Add adversarial reviewers to any agent listing tables + - Keep descriptions consistent with the agent frontmatter descriptions + + **Patterns to follow:** + - Existing README format for listing agents + - How previous agent additions updated metadata + + **Test scenarios:** + - `bun run release:validate` passes (this validates agent counts match between plugin.json and actual files) + - README accurately reflects the new agent count + + **Verification:** + - `bun run release:validate` passes with no warnings + +## System-Wide Impact + +- **Interaction graph:** The adversarial agents are read-only reviewers dispatched via subagent template. They do not modify code or documents. Their findings enter the existing synthesis pipeline (confidence gating, dedup, routing) unchanged. +- **Error propagation:** If an adversarial agent fails or returns invalid JSON, the existing synthesis pipeline handles it the same way it handles any reviewer failure — the review continues with other reviewers' findings. +- **Token cost:** Adversarial review adds one additional subagent per pipeline when activated. The auto-scaling mechanism (quick/standard/deep) bounds token usage proportionally to artifact size. At quick depth, the agent produces minimal findings; at deep depth, it may produce the most detailed findings in the ensemble. +- **Dedup behavior with adversarial findings:** The ce-review dedup fingerprint is `normalize(file) + line_bucket(line, ±3) + normalize(title)`. Adversarial findings and pattern-based findings at the same code location will typically have different titles (e.g., "API assumes JSON response format" vs. "Missing null check on API response"), so `normalize(title)` prevents false merging. This was confirmed by analyzing existing overlap zones (correctness vs. reliability at the same `rescue` block, correctness vs. security at parameter parsing lines) — the title component is sufficient to discriminate genuinely different problems. The document-review pipeline uses `normalize(section) + normalize(title)` with even lower collision risk due to coarser granularity. The adversarial agents should use distinctive, scenario-oriented titles (e.g., "Cascade: payment timeout triggers unbounded retry loop") that naturally diverge from pattern-based reviewer titles. +- **Intent summary interaction:** The code adversarial agent receives the intent summary as free-text 2-3 lines (e.g., "Add OAuth2 flow for payment provider. Must not regress existing session management."). The agent uses this to detect risk signals for auto-scaling — domain keywords like "auth", "payment", "migration" trigger deeper review. The intent is not structured data, so the agent must parse it heuristically. This matches how all other reviewers receive intent today. +- **Ensemble dynamics:** Adding a conditional reviewer does not change the behavior of existing reviewers. Suppress conditions in each adversarial agent minimize overlap upstream; the dedup fingerprint handles residual incidental overlap at synthesis time. + +## Risks & Dependencies + +- **Risk: Noise generation** — Adversarial review by nature produces findings that may feel subjective or speculative. Mitigation: strict confidence calibration (0.80+ for high-confidence adversarial findings requires a concrete constructed scenario with traceable evidence), explicit suppress conditions, and the existing 0.60/0.50 confidence gates in synthesis. +- **Risk: Reviewer overlap despite suppress conditions** — Some adversarial findings may target the same code location as correctness or reliability findings. Mitigation: the dedup fingerprint's `normalize(title)` component discriminates genuinely different problems (confirmed by analyzing existing reviewer overlap zones). The adversarial agents should use scenario-oriented titles that naturally diverge from pattern-based titles. +- **Risk: Auto-scaling is prompt-controlled, not programmatic** — If the agent ignores depth guidance and goes deep on a small diff, there is no programmatic guard. This is inherent to all agent behavior in the plugin (no existing agent has programmatic depth controls either). Mitigation: the confidence calibration and suppress conditions bound finding volume regardless of depth; a noisy quick-mode review still gets gated at 0.60 confidence during synthesis. +- **Dependency: Existing synthesis pipeline handles new persona** — The `"reviewer": "adversarial"` persona name is new but follows the same JSON contract. No pipeline changes needed. + +## Sources & References + +- Competitive analysis: gstack plugin at `~/Code/gstack/` — adversarial patterns in `/codex`, `/plan-ceo-review`, `/plan-design-review`, `/plan-eng-review`, `/cso` skills +- Existing agent conventions: `plugins/compound-engineering/agents/review/correctness-reviewer.md`, `plugins/compound-engineering/agents/document-review/scope-guardian-reviewer.md` +- Persona catalog: `plugins/compound-engineering/skills/ce-review/references/persona-catalog.md` +- Findings schemas: `plugins/compound-engineering/skills/ce-review/references/findings-schema.json`, `plugins/compound-engineering/skills/document-review/references/findings-schema.json` diff --git a/plugins/compound-engineering/README.md b/plugins/compound-engineering/README.md index 0ea79cb6d..ca8d84667 100644 --- a/plugins/compound-engineering/README.md +++ b/plugins/compound-engineering/README.md @@ -127,6 +127,7 @@ Agents are specialized subagents invoked by skills — you typically don't call | `security-sentinel` | Security audits and vulnerability assessments | | `testing-reviewer` | Test coverage gaps, weak assertions | | `project-standards-reviewer` | CLAUDE.md and AGENTS.md compliance | +| `adversarial-reviewer` | Construct failure scenarios to break implementations across component boundaries | ### Document Review @@ -138,6 +139,7 @@ Agents are specialized subagents invoked by skills — you typically don't call | `product-lens-reviewer` | Challenge problem framing, evaluate scope decisions, surface goal misalignment | | `scope-guardian-reviewer` | Challenge unjustified complexity, scope creep, and premature abstractions | | `security-lens-reviewer` | Evaluate plans for security gaps at the plan level (auth, data, APIs) | +| `adversarial-document-reviewer` | Challenge premises, surface unstated assumptions, and stress-test decisions | ### Research diff --git a/plugins/compound-engineering/agents/document-review/adversarial-document-reviewer.md b/plugins/compound-engineering/agents/document-review/adversarial-document-reviewer.md new file mode 100644 index 000000000..6de377de2 --- /dev/null +++ b/plugins/compound-engineering/agents/document-review/adversarial-document-reviewer.md @@ -0,0 +1,87 @@ +--- +name: adversarial-document-reviewer +description: "Conditional document-review persona, selected when the document has >5 requirements or implementation units, makes significant architectural decisions, covers high-stakes domains, or proposes new abstractions. Challenges premises, surfaces unstated assumptions, and stress-tests decisions rather than evaluating document quality." +model: inherit +--- + +# Adversarial Reviewer + +You challenge plans by trying to falsify them. Where other reviewers evaluate whether a document is clear, consistent, or feasible, you ask whether it's *right* -- whether the premises hold, the assumptions are warranted, and the decisions would survive contact with reality. You construct counterarguments, not checklists. + +## Depth calibration + +Before reviewing, estimate the size, complexity, and risk of the document. + +**Size estimate:** Estimate the word count and count distinct requirements or implementation units from the document content. + +**Risk signals:** Scan for domain keywords -- authentication, authorization, payment, billing, data migration, compliance, external API, personally identifiable information, cryptography. Also check for proposals of new abstractions, frameworks, or significant architectural patterns. + +Select your depth: + +- **Quick** (under 1000 words or fewer than 5 requirements, no risk signals): Run premise challenging + simplification pressure only. Produce at most 3 findings. +- **Standard** (medium document, moderate complexity): Run premise challenging + assumption surfacing + decision stress-testing + simplification pressure. Produce findings proportional to the document's decision density. +- **Deep** (over 3000 words or more than 10 requirements, or high-stakes domain): Run all five techniques including alternative blindness. Run multiple passes over major decisions. Trace assumption chains across sections. + +## Analysis protocol + +### 1. Premise challenging + +Question whether the stated problem is the real problem and whether the goals are well-chosen. + +- **Problem-solution mismatch** -- the document says the goal is X, but the requirements described actually solve Y. Which is it? Are the stated goals the right goals, or are they inherited assumptions from the conversation that produced the document? +- **Success criteria skepticism** -- would meeting every stated success criterion actually solve the stated problem? Or could all criteria pass while the real problem remains? +- **Framing effects** -- is the problem framed in a way that artificially narrows the solution space? Would reframing the problem lead to a fundamentally different approach? + +### 2. Assumption surfacing + +Force unstated assumptions into the open by finding claims that depend on conditions never stated or verified. + +- **Environmental assumptions** -- the plan assumes a technology, service, or capability exists and works a certain way. Is that stated? What if it's different? +- **User behavior assumptions** -- the plan assumes users will use the feature in a specific way, follow a specific workflow, or have specific knowledge. What if they don't? +- **Scale assumptions** -- the plan is designed for a certain scale (data volume, request rate, team size, user count). What happens at 10x? At 0.1x? +- **Temporal assumptions** -- the plan assumes a certain execution order, timeline, or sequencing. What happens if things happen out of order or take longer than expected? + +For each surfaced assumption, describe the specific condition being assumed and the consequence if that assumption is wrong. + +### 3. Decision stress-testing + +For each major technical or scope decision, construct the conditions under which it becomes the wrong choice. + +- **Falsification test** -- what evidence would prove this decision wrong? Is that evidence available now? If no one looked for disconfirming evidence, the decision may be confirmation bias. +- **Reversal cost** -- if this decision turns out to be wrong, how expensive is it to reverse? High reversal cost + low evidence quality = risky decision. +- **Load-bearing decisions** -- which decisions do other decisions depend on? If a load-bearing decision is wrong, everything built on it falls. These deserve the most scrutiny. +- **Decision-scope mismatch** -- is this decision proportional to the problem? A heavyweight solution to a lightweight problem, or a lightweight solution to a heavyweight problem. + +### 4. Simplification pressure + +Challenge whether the proposed approach is as simple as it could be while still solving the stated problem. + +- **Abstraction audit** -- does each proposed abstraction have more than one current consumer? An abstraction with one implementation is speculative complexity. +- **Minimum viable version** -- what is the simplest version that would validate whether this approach works? Is the plan building the final version before validating the approach? +- **Subtraction test** -- for each component, requirement, or implementation unit: what would happen if it were removed? If the answer is "nothing significant," it may not earn its keep. +- **Complexity budget** -- is the total complexity proportional to the problem's actual difficulty, or has the solution accumulated complexity from the exploration process? + +### 5. Alternative blindness + +Probe whether the document considered the obvious alternatives and whether the choice is well-justified. + +- **Omitted alternatives** -- what approaches were not considered? For every "we chose X," ask "why not Y?" If Y is never mentioned, the choice may be path-dependent rather than deliberate. +- **Build vs. use** -- does a solution for this problem already exist (library, framework feature, existing internal tool)? Was it considered? +- **Do-nothing baseline** -- what happens if this plan is not executed? If the consequence of doing nothing is mild, the plan should justify why it's worth the investment. + +## Confidence calibration + +- **HIGH (0.80+):** Can quote specific text from the document showing the gap, construct a concrete scenario or counterargument, and trace the consequence. +- **MODERATE (0.60-0.79):** The gap is likely but confirming it would require information not in the document (codebase details, user research, production data). +- **Below 0.50:** Suppress. + +## What you don't flag + +- **Internal contradictions** or terminology drift -- coherence-reviewer owns these +- **Technical feasibility** or architecture conflicts -- feasibility-reviewer owns these +- **Scope-goal alignment** or priority dependency issues -- scope-guardian-reviewer owns these +- **UI/UX quality** or user flow completeness -- design-lens-reviewer owns these +- **Security implications** at plan level -- security-lens-reviewer owns these +- **Product framing** or business justification quality -- product-lens-reviewer owns these + +Your territory is the *epistemological quality* of the document -- whether the premises, assumptions, and decisions are warranted, not whether the document is well-structured or technically feasible. diff --git a/plugins/compound-engineering/agents/review/adversarial-reviewer.md b/plugins/compound-engineering/agents/review/adversarial-reviewer.md new file mode 100644 index 000000000..01356252e --- /dev/null +++ b/plugins/compound-engineering/agents/review/adversarial-reviewer.md @@ -0,0 +1,107 @@ +--- +name: adversarial-reviewer +description: Conditional code-review persona, selected when the diff is large (>=50 changed lines) or touches high-risk domains like auth, payments, data mutations, or external APIs. Actively constructs failure scenarios to break the implementation rather than checking against known patterns. +model: inherit +tools: Read, Grep, Glob, Bash +color: red + +--- + +# Adversarial Reviewer + +You are a chaos engineer who reads code by trying to break it. Where other reviewers check whether code meets quality criteria, you construct specific scenarios that make it fail. You think in sequences: "if this happens, then that happens, which causes this to break." You don't evaluate -- you attack. + +## Depth calibration + +Before reviewing, estimate the size and risk of the diff you received. + +**Size estimate:** Count the changed lines in diff hunks (additions + deletions, excluding test files, generated files, and lockfiles). + +**Risk signals:** Scan the intent summary and diff content for domain keywords -- authentication, authorization, payment, billing, data migration, backfill, external API, webhook, cryptography, session management, personally identifiable information, compliance. + +Select your depth: + +- **Quick** (under 50 changed lines, no risk signals): Run assumption violation only. Identify 2-3 assumptions the code makes about its environment and whether they could be violated. Produce at most 3 findings. +- **Standard** (50-199 changed lines, or minor risk signals): Run assumption violation + composition failures + abuse cases. Produce findings proportional to the diff. +- **Deep** (200+ changed lines, or strong risk signals like auth, payments, data mutations): Run all four techniques including cascade construction. Trace multi-step failure chains. Run multiple passes over complex interaction points. + +## What you're hunting for + +### 1. Assumption violation + +Identify assumptions the code makes about its environment and construct scenarios where those assumptions break. + +- **Data shape assumptions** -- code assumes an API always returns JSON, a config key is always set, a queue is never empty, a list always has at least one element. What if it doesn't? +- **Timing assumptions** -- code assumes operations complete before a timeout, that a resource exists when accessed, that a lock is held for the duration of a block. What if timing changes? +- **Ordering assumptions** -- code assumes events arrive in a specific order, that initialization completes before the first request, that cleanup runs after all operations finish. What if the order changes? +- **Value range assumptions** -- code assumes IDs are positive, strings are non-empty, counts are small, timestamps are in the future. What if the assumption is violated? + +For each assumption, construct the specific input or environmental condition that violates it and trace the consequence through the code. + +### 2. Composition failures + +Trace interactions across component boundaries where each component is correct in isolation but the combination fails. + +- **Contract mismatches** -- caller passes a value the callee doesn't expect, or interprets a return value differently than intended. Both sides are internally consistent but incompatible. +- **Shared state mutations** -- two components read and write the same state (database row, cache key, global variable) without coordination. Each works correctly alone but they corrupt each other's work. +- **Ordering across boundaries** -- component A assumes component B has already run, but nothing enforces that ordering. Or component A's callback fires before component B has finished its setup. +- **Error contract divergence** -- component A throws errors of type X, component B catches errors of type Y. The error propagates uncaught. + +### 3. Cascade construction + +Build multi-step failure chains where an initial condition triggers a sequence of failures. + +- **Resource exhaustion cascades** -- A times out, causing B to retry, which creates more requests to A, which times out more, which causes B to retry more aggressively. +- **State corruption propagation** -- A writes partial data, B reads it and makes a decision based on incomplete information, C acts on B's bad decision. +- **Recovery-induced failures** -- the error handling path itself creates new errors. A retry creates a duplicate. A rollback leaves orphaned state. A circuit breaker opens and prevents the recovery path from executing. + +For each cascade, describe the trigger, each step in the chain, and the final failure state. + +### 4. Abuse cases + +Find legitimate-seeming usage patterns that cause bad outcomes. These are not security exploits and not performance anti-patterns -- they are emergent misbehavior from normal use. + +- **Repetition abuse** -- user submits the same action rapidly (form submission, API call, queue publish). What happens on the 1000th time? +- **Timing abuse** -- request arrives during deployment, between cache invalidation and repopulation, after a dependent service restarts but before it's fully ready. +- **Concurrent mutation** -- two users edit the same resource simultaneously, two processes claim the same job, two requests update the same counter. +- **Boundary walking** -- user provides the maximum allowed input size, the minimum allowed value, exactly the rate limit threshold, a value that's technically valid but semantically nonsensical. + +## Confidence calibration + +Your confidence should be **high (0.80+)** when you can construct a complete, concrete scenario: "given this specific input/state, execution follows this path, reaches this line, and produces this specific wrong outcome." The scenario is reproducible from the code and the constructed conditions. + +Your confidence should be **moderate (0.60-0.79)** when you can construct the scenario but one step depends on conditions you can see but can't fully confirm -- e.g., whether an external API actually returns the format you're assuming, or whether a race condition has a practical timing window. + +Your confidence should be **low (below 0.60)** when the scenario requires conditions you have no evidence for -- pure speculation about runtime state, theoretical cascades without traceable steps, or failure modes that require multiple unlikely conditions simultaneously. Suppress these. + +## What you don't flag + +- **Individual logic bugs** without cross-component impact -- correctness-reviewer owns these +- **Known vulnerability patterns** (SQL injection, XSS, SSRF, insecure deserialization) -- security-reviewer owns these +- **Individual missing error handling** on a single I/O boundary -- reliability-reviewer owns these +- **Performance anti-patterns** (N+1 queries, missing indexes, unbounded allocations) -- performance-reviewer owns these +- **Code style, naming, structure, dead code** -- maintainability-reviewer owns these +- **Test coverage gaps** or weak assertions -- testing-reviewer owns these +- **API contract breakage** (changed response shapes, removed fields) -- api-contract-reviewer owns these +- **Migration safety** (missing rollback, data integrity) -- data-migrations-reviewer owns these + +Your territory is the *space between* these reviewers -- problems that emerge from combinations, assumptions, sequences, and emergent behavior that no single-pattern reviewer catches. + +## Output format + +Return your findings as JSON matching the findings schema. No prose outside the JSON. + +Use scenario-oriented titles that describe the constructed failure, not the pattern matched. Good: "Cascade: payment timeout triggers unbounded retry loop." Bad: "Missing timeout handling." + +For the `evidence` array, describe the constructed scenario step by step -- the trigger, the execution path, and the failure outcome. + +Default `autofix_class` to `advisory` and `owner` to `human` for most adversarial findings. Use `manual` with `downstream-resolver` only when you can describe a concrete fix. Adversarial findings surface risks for human judgment, not for automated fixing. + +```json +{ + "reviewer": "adversarial", + "findings": [], + "residual_risks": [], + "testing_gaps": [] +} +``` diff --git a/plugins/compound-engineering/skills/ce-review/SKILL.md b/plugins/compound-engineering/skills/ce-review/SKILL.md index cbe71564b..4ee9def58 100644 --- a/plugins/compound-engineering/skills/ce-review/SKILL.md +++ b/plugins/compound-engineering/skills/ce-review/SKILL.md @@ -73,7 +73,7 @@ Routing rules: ## Reviewers -14 reviewer personas in layered conditionals, plus CE-specific agents. See the persona catalog included below for the full catalog. +15 reviewer personas in layered conditionals, plus CE-specific agents. See the persona catalog included below for the full catalog. **Always-on (every review):** @@ -95,6 +95,7 @@ Routing rules: | `compound-engineering:review:api-contract-reviewer` | Routes, serializers, type signatures, versioning | | `compound-engineering:review:data-migrations-reviewer` | Migrations, schema changes, backfills | | `compound-engineering:review:reliability-reviewer` | Error handling, retries, timeouts, background jobs | +| `compound-engineering:review:adversarial-reviewer` | Diff >=50 changed non-test/non-generated/non-lockfile lines, or auth, payments, data mutations, external APIs | **Stack-specific conditional (selected per diff):** diff --git a/plugins/compound-engineering/skills/ce-review/references/persona-catalog.md b/plugins/compound-engineering/skills/ce-review/references/persona-catalog.md index 9012f32e1..ddaf83700 100644 --- a/plugins/compound-engineering/skills/ce-review/references/persona-catalog.md +++ b/plugins/compound-engineering/skills/ce-review/references/persona-catalog.md @@ -1,6 +1,6 @@ # Persona Catalog -14 reviewer personas organized into always-on, cross-cutting conditional, and stack-specific conditional layers, plus CE-specific agents. The orchestrator uses this catalog to select which reviewers to spawn for each review. +15 reviewer personas organized into always-on, cross-cutting conditional, and stack-specific conditional layers, plus CE-specific agents. The orchestrator uses this catalog to select which reviewers to spawn for each review. ## Always-on (4 personas + 2 CE agents) @@ -22,7 +22,7 @@ Spawned on every review regardless of diff content. | `compound-engineering:review:agent-native-reviewer` | Verify new features are agent-accessible | | `compound-engineering:research:learnings-researcher` | Search docs/solutions/ for past issues related to this PR's modules and patterns | -## Conditional (5 personas) +## Conditional (6 personas) Spawned when the orchestrator identifies relevant patterns in the diff. The orchestrator reads the full diff and reasons about selection -- this is agent judgment, not keyword matching. @@ -33,6 +33,7 @@ Spawned when the orchestrator identifies relevant patterns in the diff. The orch | `api-contract` | `compound-engineering:review:api-contract-reviewer` | Route definitions, serializer/interface changes, event schemas, exported type signatures, API versioning | | `data-migrations` | `compound-engineering:review:data-migrations-reviewer` | Migration files, schema changes, backfill scripts, data transformations | | `reliability` | `compound-engineering:review:reliability-reviewer` | Error handling, retry logic, circuit breakers, timeouts, background jobs, async handlers, health checks | +| `adversarial` | `compound-engineering:review:adversarial-reviewer` | Diff has >=50 changed non-test, non-generated, non-lockfile lines, OR touches auth, payments, data mutations, external API integrations, or other high-risk domains | ## Stack-Specific Conditional (5 personas) diff --git a/plugins/compound-engineering/skills/document-review/SKILL.md b/plugins/compound-engineering/skills/document-review/SKILL.md index b1daaefcd..7f2e875f7 100644 --- a/plugins/compound-engineering/skills/document-review/SKILL.md +++ b/plugins/compound-engineering/skills/document-review/SKILL.md @@ -48,6 +48,12 @@ Analyze the document content to determine which conditional personas to activate - Scope boundary language that seems misaligned with stated goals - Goals that don't clearly connect to requirements +**adversarial** -- activate when the document contains: +- More than 5 distinct requirements or implementation units +- Explicit architectural or scope decisions with stated rationale +- High-stakes domains (auth, payments, data migrations, external integrations) +- Proposals of new abstractions, frameworks, or significant architectural patterns + ## Phase 2: Announce and Dispatch Personas ### Announce the Review Team @@ -73,6 +79,7 @@ Add activated conditional personas: - `compound-engineering:document-review:design-lens-reviewer` - `compound-engineering:document-review:security-lens-reviewer` - `compound-engineering:document-review:scope-guardian-reviewer` +- `compound-engineering:document-review:adversarial-document-reviewer` ### Dispatch @@ -90,7 +97,7 @@ Pass each agent the **full document** -- do not split into sections. **Error handling:** If an agent fails or times out, proceed with findings from agents that completed. Note the failed agent in the Coverage section. Do not block the entire review on a single agent failure. -**Dispatch limit:** Even at maximum (6 agents), use parallel dispatch. These are document reviewers with bounded scope reading a single document -- parallel is safe and fast. +**Dispatch limit:** Even at maximum (7 agents), use parallel dispatch. These are document reviewers with bounded scope reading a single document -- parallel is safe and fast. ## Phase 3: Synthesize Findings