diff --git a/apps/claude-code/pr-review/CONTEXT.md b/apps/claude-code/pr-review/CONTEXT.md index 35266a2..1b896d6 100644 --- a/apps/claude-code/pr-review/CONTEXT.md +++ b/apps/claude-code/pr-review/CONTEXT.md @@ -123,6 +123,30 @@ A Thread Classification state. No action taken; the issue still exists in the ne **obsolete**: A Thread Classification state. The relevant code was deleted or moved; the comment no longer applies. +### Platform-failure handling + +**Notice**: +A user-facing message emitted by an orchestration agent when a Review operation completed in a non-OK Notice Tier. Carries `severity` (`info` or `warning`), `kind` (a small enum identifying the failed operation), and a one-line `message`. Notices are merged across agents by the orchestrator, rendered in the Review Summary, included in the end-of-run Trailer, and (for Pre-PR mode) printed in the Claude interface before findings. +_Avoid_: warning, error, log line + +**Notice Tier**: +A four-state classification of every Review operation outcome: **OK**, **EMPTY-BY-DESIGN**, **DEGRADED**, **ABORTED**. The tier choice IS the gating decision — there is no fifth "ask the user" tier. Failure modes that tempt one are reclassified as ABORTED. + +**OK**: +A Notice Tier. The operation completed with a non-empty result. No Notice emitted. + +**EMPTY-BY-DESIGN**: +A Notice Tier. The operation completed with an empty result that is a legitimate domain state (no work-items linked, no Confluence pages, no prior threads). Currently emits an `info` Notice only for the Doc Context family; other empty states are inherent to the Review type and stay silent. + +**DEGRADED**: +A Notice Tier. The operation failed but the Review can still complete with reduced coverage. Emits a `warning` Notice; the Review still posts. + +**ABORTED**: +A Notice Tier. The operation failed and continuing would corrupt cross-run state (Bot Signature drift, Summary thread desync, mode misdetection). The run stops before the Review Summary is composed; the failure goes to stderr plus the end-of-run Trailer. + +**Trailer**: +A single end-of-run line printed by the orchestrator to the Claude interface, regardless of mode or success state. Carries findings count by severity, Notice counts by severity, and (for ADO modes) the PR URL. Designed for AFK skim: the invoker sees outcome status without opening the PR. + ## Relationships - A **Review** produces one **Review Summary**, zero or more **Inline Comments**, and zero or more **General Comments** @@ -137,6 +161,7 @@ A Thread Classification state. The relevant code was deleted or moved; the comme - The **ADO Fetcher** is invoked by first-review and re-review modes; **Pre-PR mode** skips it entirely and goes directly to Review Aspect agents - The **Re-review Coordinator** is invoked only when the mode is re-review; first-review and pre-PR modes never load it - The **ADO Writer** is invoked by first-review and re-review modes; **Pre-PR mode** does not write back to ADO +- Every operation in an orchestration agent terminates in one of the four **Notice Tiers**. **DEGRADED** and **EMPTY-BY-DESIGN**-with-message operations emit a **Notice** that flows from the agent's structured result block, through the orchestrator's merge step, into the **Review Summary** (for ADO modes) or the printed pre-findings block (for **Pre-PR mode**). The end-of-run **Trailer** carries Notice counts so the invoker sees them without opening the PR. ## Example dialogue diff --git a/docs/inbox/pr-review-ado-error-hardening-pass.md b/docs/inbox/pr-review-ado-error-hardening-pass.md deleted file mode 100644 index f271aaf..0000000 --- a/docs/inbox/pr-review-ado-error-hardening-pass.md +++ /dev/null @@ -1,36 +0,0 @@ ---- -title: pr-review ADO error-hardening pass -created: 2026-05-13 ---- - -**Status:** needs-triage -**Category:** enhancement - -> _This was generated by AI during triage of PR #29._ - -PR #29 (pr-review orchestrator split) addressed the highest-stakes silent-failure paths in the ADO Writer and the orchestrator's mode-detection step (H1–H5). A second wave of silent-failure findings from the same review was deferred because the changes span multiple agents and helpers and feel like a feature in themselves — "ADO write reliability" — rather than a fit for the orchestrator-split PR. Capture them here so they aren't lost. - -## Items to harden - -- **ADO Fetcher Step 2 (iterations fetch).** `ITERATIONS_JSON=$(az devops invoke ...)` does not capture the exit code; an `az` failure produces an empty variable, which the subsequent Node parse silently coerces to `LATEST_ITERATION_ID=''` and `LATEST_COMMIT_SHA=''`. Every comment is then signed `Iteration ` (empty) and re-review detection breaks forever afterward because no `PRIOR_ITERATION_ID` can match `""`. Capture the exit code, branch on it, and abort the whole agent with a clear error rather than emitting a signature-less review. -- **ADO Fetcher Step 5 (work-item fetch).** `WI_RESPONSE=$(az devops invoke ... 2>/dev/null) || WI_RESPONSE=""` conflates legitimate "no work items linked" with auth expiry, project rename, 5xx, extension uninstall, network partition. The Doc Context Orchestrator then runs without business context, silently. Capture stderr, log it, and emit a distinct sentinel (e.g. `WORK_ITEM_IDS_ERROR`) so the orchestrator can surface to the user. -- **ADO Fetcher Step 4 (diff-range fallback).** When the prior commit can't be fetched, the agent silently falls back to the full diff. The Re-review Coordinator then classifies prior threads against the wider hunk set and may flip threads from `obsolete` to `pending` or vice versa. Either propagate a `DIFF_RANGE: full|incremental` field through `ADO_FETCHER_RESULT` so the Coordinator can react, or skip classification when the fallback fires. -- **`parseWorkItemIds` discriminated union.** The helper currently maps null/undefined responses to `[]` — the JSDoc even codifies this as intentional. Replace with `{ ok: true, ids } | { ok: false, reason }` so callers must distinguish "no work items" from "fetch failed". -- **`parseAdoWriterResult` discriminated union.** Returns `null` when the result block is missing. The orchestrator has no documented branch for "writer returned null block" — it presumably treats it as 0/empty and proceeds to report success. Either throw on missing block or add a `{ status: 'missing' | 'parsed' }` variant. -- **`parseIterations` empty-input default.** Returns `{ latestIterationId: 1, latestCommitSha: '' }` when `value` is empty — but `iterationId=1` is explicitly the iteration the project never uses (per plugin CLAUDE.md). If a fetch failure produces an empty value array, the agent happily emits `Iteration 1` signatures, silently violating the rule. Either throw, or return a discriminated union. -- **Pre-PR default-branch detection.** `DEFAULT_BRANCH=$(git remote show origin 2>/dev/null | grep 'HEAD branch' | awk '{print $NF}' || echo "main")` silently falls back to `main`. The repo's own Gitflow uses `develop` — so a transient `git remote show` failure would compute the pre-PR diff against the wrong base and review hundreds of unrelated commits with no warning. Surface the fallback with a stderr warning. -- **Inline POST `*.err` files cleanup.** `Step 4 — Clean up` in the Writer runs `rm -f` unconditionally — destroying the only persistent record of stderr from failed inline POSTs. Keep them on failure (`if FINDINGS_POSTED < EXPECTED`) or stream their contents to stderr before cleanup. -- **Re-review Coordinator PATCH-to-fixed catch-all.** The Node catch block only special-cases `409` → continue; everything else (401, 403, 404, 5xx, network) becomes a 200-character "PATCH warning" in stdout that no automated layer inspects. Threads stay active when the user expects them fixed, and the next re-review re-replies to them. Distinguish recoverable (409) from unrecoverable and propagate the latter. -- **Re-review Coordinator per-finding match parse.** `MATCH=$(... 2>/dev/null || echo "")` — on Node parse error, CLASSIFICATION and THREAD_ID both end up empty strings, dispatch falls through to "no match → add to freshFindings", and the reviewer sees the same finding posted twice. Capture the exit code separately and abort/log instead of silently downgrading. -- **`pre-pr.mjs` parseChangedFilesFromDiff edge case.** `[]` is returned for both "empty input" and "diff with no `diff --git` headers" — the latter is suspicious (a real diff always has those headers). Pre-PR mode then prints `✅ Pre-PR review complete — no issues found.` even when the actual reason was a broken pipeline. Log a debug line when non-empty input produces zero files, or have `buildPrePrContext` throw on suspicious shapes. - -## What grilling needs to resolve - -- Is "harden every silent-failure path" a single feature, or should it be split (Writer / Fetcher / Coordinator / Pre-PR)? -- Where is the line between "abort and surface to user" and "log and continue"? Different items currently lean different ways. -- Adding `try/catch` + structured exit codes around every Node heredoc balloons the agent prompts — is that acceptable, or do we extract more helpers (the `mode-detection.mjs` precedent) so the bash side gets simpler? -- Are the discriminated-union refactors of `parseWorkItemIds` / `parseIterations` / `parseAdoWriterResult` a breaking change to the helper API? If they have any external consumers (they shouldn't, but verify), call it out. - -## Source - -PR #29 silent-failure-hunter review. The fixed subset (H1–H5) is documented in `apps/claude-code/pr-review/CHANGELOG.md` under [Unreleased] → Fixed. diff --git a/docs/issues/pr-review-ado-fetcher-reliability/01-end-to-end-notice-pipeline.md b/docs/issues/pr-review-ado-fetcher-reliability/01-end-to-end-notice-pipeline.md new file mode 100644 index 0000000..3c1c342 --- /dev/null +++ b/docs/issues/pr-review-ado-fetcher-reliability/01-end-to-end-notice-pipeline.md @@ -0,0 +1,51 @@ +# A1. End-to-end Notice pipeline via Doc-Context info Notice + ADR-0014 + +**Status:** ready-for-agent +**Category:** enhancement +**Plugin:** `apps/claude-code/pr-review` +**Type:** AFK + +## Parent + +`docs/issues/pr-review-ado-fetcher-reliability/PRD.md` + +## What to build + +Deliver the foundational Notice pipeline end-to-end with one concrete emission site: the Doc-Context `info` Notice when the PR has no linked work items. + +Implementation cuts through every layer: + +- **New helper** `scripts/ado/notices.mjs` — pure functions `createNotice`, `mergeNotices` (dedupe by `kind`), `formatNoticesAsSummaryBlock`, `formatNoticesAsPrePrPreamble`, `formatTrailer`. With unit tests. +- **ADO Fetcher prompt** — `ADO_FETCHER_RESULT_START/END` block gains a `NOTICES: [...]` field. When `WORK_ITEM_IDS=[]`, the Fetcher appends an `info` Notice (`kind: doc-context`, message: "Reviewed without business context — no work items linked to this PR."). +- **Orchestrator** — parses `NOTICES` from the Fetcher result, merges via the new helper (no-op at this stage but the merge wiring must exist), passes a merged `NOTICES_JSON` to the ADO Writer prompt. +- **ADO Writer prompt** — accepts the new `NOTICES_JSON` input; renders a `## Notices` block (with `ℹ️` for `info`, `⚠` for `warning`) above the severity-grouped findings in the Summary content. +- **Trailer** — orchestrator prints a mandatory end-of-run line in the Claude interface for every run (ADO modes, Pre-PR mode, aborted). Carries findings counts, notice counts, and (for ADO modes) the PR URL. +- **ADR-0014** — new `apps/claude-code/pr-review/docs/adr/0014-notice-tier-doctrine-and-failure-classification-helpers.md`. Records the four-tier doctrine (OK / EMPTY-BY-DESIGN / DEGRADED / ABORTED), the no-fifth-ASK-tier rule, and the JS-helper-layer refinement to ADR 0013. +- **CHANGELOG** — `[Unreleased]` Added entries for the new helper and ADR; Changed entries for the Fetcher result block, the orchestrator merge wiring, and the Summary rendering. +- **`commands/review-pr.md`** stays ≤ 200 lines. + +End-to-end demoable: invoke `/pr-review:review-pr` against an ADO PR with no linked work items. The Summary opens with `ℹ️ Reviewed without business context — no work items linked to this PR.` followed by the findings. The Claude interface ends with `✅ Review posted: findings · 0 warning notices · 1 info notice → `. + +## Acceptance criteria + +- [ ] `scripts/ado/notices.mjs` exists with all five exported functions and passes `pnpm --filter pr-review test`. +- [ ] `ADO_FETCHER_RESULT_START/END` block emits a `NOTICES` field (JSON array; empty array if no notices). +- [ ] Orchestrator parses, merges, and passes `NOTICES` to the ADO Writer prompt. +- [ ] ADO Writer Summary content renders a `## Notices` block above findings when `NOTICES` is non-empty; no block when empty. +- [ ] Doc-Context info Notice appears on a real ADO PR with no linked work items. +- [ ] End-of-run Trailer line is printed in the Claude interface for every run (success, abort, pre-PR). +- [ ] ADR-0014 exists at the documented path. +- [ ] `commands/review-pr.md` is ≤ 200 lines. +- [ ] `pnpm format`, `pnpm check`, and `pnpm --filter pr-review verify:changelog` all pass. + +## Blocked by + +None — can start immediately. + +--- + +## Triage Notes + +> _This was generated by AI during triage._ + +Locked during the `/grill-with-docs` session of 2026-05-13: Q2 four-tier doctrine (OK / EMPTY-BY-DESIGN / DEGRADED / ABORTED, no fifth ASK tier), Q3 Doc-Context info-Notice carve-out, Q4 Option A Notice flow (per-agent `NOTICES` array, orchestrator merges with `kind`-based dedup, Writer renders `## Notices` block above findings), and the mandatory Trailer convention. ADR-0014 content is fully specified in PRD A's "Implementation Decisions → Helper layer". No outstanding questions; ready for an AFK agent to implement. diff --git a/docs/issues/pr-review-ado-fetcher-reliability/02-classify-http-error-and-work-items.md b/docs/issues/pr-review-ado-fetcher-reliability/02-classify-http-error-and-work-items.md new file mode 100644 index 0000000..08f417f --- /dev/null +++ b/docs/issues/pr-review-ado-fetcher-reliability/02-classify-http-error-and-work-items.md @@ -0,0 +1,49 @@ +# A2. `classify-http-error` + `fetch-work-items` refactor → DEGRADED tier + ADR-0015 + +**Status:** ready-for-agent +**Category:** enhancement +**Plugin:** `apps/claude-code/pr-review` +**Type:** AFK + +## Parent + +`docs/issues/pr-review-ado-fetcher-reliability/PRD.md` + +## What to build + +Wire the DEGRADED tier end-to-end by introducing the canonical HTTP-tier mapping and applying it to the ADO Fetcher's work-item fetch. + +Implementation cuts through every layer: + +- **New helper** `scripts/ado/classify-http-error.mjs` — pure function `({ status, body, exitCode }) → { tier: 'ok' | 'degraded' | 'aborted', kind, message }`. Encodes the canonical mapping (200/201/404/409 → ok; 401/403 → aborted; 5xx/network/4xx → degraded). With unit tests covering every row of the mapping table from PRD A's Implementation Decisions, plus malformed-body and network-exit-code paths. +- **New helper** `scripts/ado/fetch-work-items.mjs` — wraps the work-items fetch (currently inline in the Fetcher prompt). Returns `{ ok: true, ids } | { ok: false, reason, message }`. Subsumes today's `parseWorkItemIds`. Empty array on successful fetch → `{ ok: true, ids: [] }` (legitimate EMPTY-BY-DESIGN). Auth/5xx/network → `{ ok: false }` with a kind and message ready for Notice creation. With unit tests covering the discriminated-union behaviour. +- **ADO Fetcher prompt** — Step 5 (work-item fetch) refactored to `await import(...)` the new helper. On `{ ok: false }`, emit a DEGRADED Notice (`kind: work-items`, message from the helper) into the Fetcher's `NOTICES` array (the channel wired by A1). The Doc-Context info Notice from A1 still fires when `{ ok: true, ids: [] }`. +- **ADR-0015** — new `apps/claude-code/pr-review/docs/adr/0015-canonical-http-tier-mapping.md`. Records the HTTP-tier mapping, the 401/403 abort rule, and the no-retries-in-v1 stance. +- **CHANGELOG** — `[Unreleased]` Added entries for the two new helpers and ADR-0015; Changed entry for the Fetcher prompt's work-item step. + +Subsumption: today's `parseWorkItemIds` is removed (subsumed into `fetch-work-items.mjs`), and its existing test file is replaced by the new helper's tests. + +End-to-end demoable: invoke `/pr-review:review-pr` against an ADO PR while the local Azure CLI is unauthenticated (e.g. revoke the token). The Summary renders `## Notices` containing `⚠ work-items: Failed to fetch linked work items (auth error). Review proceeded without business context.` The Trailer reports `· 1 warning notice`. Restore auth and the same PR shows the Doc-Context info Notice from A1 (or no Notice at all if work items are populated). + +## Acceptance criteria + +- [ ] `scripts/ado/classify-http-error.mjs` exists with full HTTP-tier-mapping unit tests (≥ 10 cases). +- [ ] `scripts/ado/fetch-work-items.mjs` exists with discriminated-union return shape and unit tests; subsumes `parseWorkItemIds`. +- [ ] Old `parseWorkItemIds` exports and tests are removed. +- [ ] ADO Fetcher prompt's work-item step calls the new helper via `await import(...)`. +- [ ] A simulated work-item-fetch failure (e.g. wrong PROJECT name, revoked auth) produces a DEGRADED Notice with `kind: work-items` in the Summary. +- [ ] ADR-0015 exists at the documented path. +- [ ] `commands/review-pr.md` is ≤ 200 lines. +- [ ] `pnpm format`, `pnpm check`, `pnpm --filter pr-review test`, `pnpm --filter pr-review verify:changelog` all pass. + +## Blocked by + +`docs/issues/pr-review-ado-fetcher-reliability/01-end-to-end-notice-pipeline.md` + +--- + +## Triage Notes + +> _This was generated by AI during triage._ + +Locked during the `/grill-with-docs` session of 2026-05-13: Q7 canonical HTTP-tier mapping (200/201/404/409 → OK; 401/403 → ABORTED; 5xx/network/4xx → DEGRADED; no retries in v1). Helper-API discriminated-union refactor for `parseWorkItemIds` verified breaking-change-free (zero consumers outside the plugin, confirmed by `grep` across `apps/`, `packages/`, `docs/`). ADR-0015 content is fully specified in PRD A's "Implementation Decisions → Canonical HTTP-tier mapping". No outstanding questions. diff --git a/docs/issues/pr-review-ado-fetcher-reliability/03-fetch-iterations-aborted-tier.md b/docs/issues/pr-review-ado-fetcher-reliability/03-fetch-iterations-aborted-tier.md new file mode 100644 index 0000000..127e5b0 --- /dev/null +++ b/docs/issues/pr-review-ado-fetcher-reliability/03-fetch-iterations-aborted-tier.md @@ -0,0 +1,49 @@ +# A3. `fetch-iterations` refactor → ABORTED tier + +**Status:** ready-for-agent +**Category:** enhancement +**Plugin:** `apps/claude-code/pr-review` +**Type:** AFK + +## Parent + +`docs/issues/pr-review-ado-fetcher-reliability/PRD.md` + +## What to build + +Wire the ABORTED tier end-to-end by refactoring the iterations fetch and removing the `iterationId=1` default that today silently violates CLAUDE.md. + +Implementation cuts through every layer: + +- **New helper** `scripts/ado/fetch-iterations.mjs` — wraps the iterations fetch and parse. Returns `{ ok: true, latestIterationId, latestCommitSha } | { ok: false, reason: 'empty-iterations' | 'auth' | 'transient' | 'malformed', message }`. Uses `classify-http-error` (from A2) for HTTP failures. Empty `value` array on a real PR → `{ ok: false, reason: 'empty-iterations' }`. 401/403 on the fetch → `{ ok: false, reason: 'auth' }`. With unit tests covering each branch. +- **ADO Fetcher prompt** — Step 2 (iterations fetch) refactored to `await import(...)` the new helper. On `{ ok: false }` of any kind, the prompt exits non-zero with a clear stderr message ("ERROR: . Try `az devops login` to re-authenticate." for auth; "ERROR: iterations endpoint returned empty value array. Cannot sign Review with a valid Iteration ID." for empty). +- **Orchestrator** — recognises a Fetcher non-zero exit and prints the Trailer aborted line in the Claude interface: `❌ Review aborted: `. No Summary is composed (the run never reaches the Writer). +- **Removed:** today's `parseIterations` helper exports and tests are subsumed into `fetch-iterations.mjs`. +- **CHANGELOG** — `[Unreleased]` Added entry for the new helper; Changed entry for the Fetcher prompt's iterations step; Fixed entry calling out the `iterationId=1` default removal. + +Subsumption: the existing `parseIterations` test fixtures move into the new helper's tests. The empty-input case is reclassified from "returns iterationId=1" to "returns `{ ok: false, reason: 'empty-iterations' }`" per the grilling decision. + +End-to-end demoable: invoke `/pr-review:review-pr` against a PR while the local `az devops login` is expired. The Claude interface ends with `❌ Review aborted: auth — Failed to fetch iterations (HTTP 401). Try \`az devops login\` to re-authenticate.` No Summary is posted. Restore auth and the same PR signs comments with the correct latest iteration. + +## Acceptance criteria + +- [ ] `scripts/ado/fetch-iterations.mjs` exists with discriminated-union return shape and unit tests (≥ 6 cases including empty value, 401, 5xx, malformed JSON, single iteration, multiple iterations). +- [ ] Old `parseIterations` exports and tests are removed. +- [ ] ADO Fetcher prompt's iterations step calls the new helper via `await import(...)`. +- [ ] A simulated empty-iterations response causes the run to ABORT with a clear stderr message and a Trailer aborted line. +- [ ] A simulated 401 on the iterations endpoint causes the same abort with `reason: auth` in the Trailer. +- [ ] No comment is ever signed with `Iteration ` (empty) or `Iteration 1` due to the empty-default path. +- [ ] `commands/review-pr.md` is ≤ 200 lines. +- [ ] `pnpm format`, `pnpm check`, `pnpm --filter pr-review test`, `pnpm --filter pr-review verify:changelog` all pass. + +## Blocked by + +`docs/issues/pr-review-ado-fetcher-reliability/02-classify-http-error-and-work-items.md` + +--- + +## Triage Notes + +> _This was generated by AI during triage._ + +Locked during the `/grill-with-docs` session of 2026-05-13: Q5 (empty-iterations reclassification — `value: []` on a real PR is ABORTED, including for merged PRs per the explicit "record-keeping reviews are not supported on PRs with missing iteration history" caveat). The `iterationId=1` default that currently violates CLAUDE.md is removed. `parseIterations` is subsumed; verified breaking-change-free. No outstanding questions. diff --git a/docs/issues/pr-review-ado-fetcher-reliability/04-diff-range-sentinel.md b/docs/issues/pr-review-ado-fetcher-reliability/04-diff-range-sentinel.md new file mode 100644 index 0000000..3e0d69a --- /dev/null +++ b/docs/issues/pr-review-ado-fetcher-reliability/04-diff-range-sentinel.md @@ -0,0 +1,45 @@ +# A4. `DIFF_RANGE` sentinel + ADR-0004 amendment + +**Status:** ready-for-agent +**Category:** enhancement +**Plugin:** `apps/claude-code/pr-review` +**Type:** AFK + +## Parent + +`docs/issues/pr-review-ado-fetcher-reliability/PRD.md` + +## What to build + +Emit the `DIFF_RANGE` sentinel and the corresponding Notice when the Fetcher's existing diff-range fallback fires, and amend ADR 0004 in-place with the γ-downgrade rule that PRD B's Coordinator will consume. + +Implementation cuts through every layer: + +- **ADO Fetcher prompt** — Step 4 (raw diff) updated to emit `DIFF_RANGE: full | incremental` as a new field in the `ADO_FETCHER_RESULT_START/END` block. The value reflects which diff range was actually computed: `incremental` when the prior iteration's commit was reachable and the diff ran against `${PRIOR_COMMIT_SHA}..${LATEST_COMMIT_SHA}`; `full` when any fallback fired and the diff ran against `origin/${TARGET_BRANCH}...HEAD`. When `full`, the prompt also appends a DEGRADED Notice (`kind: diff-range`, message: "Incremental diff unavailable — Coordinator will classify against the full PR diff with conservative downgrades.") to the Fetcher's `NOTICES` array. +- **Orchestrator** — parses the new `DIFF_RANGE` field alongside the other Fetcher result fields. PRD A does not yet consume the value; PRD B (issue B3) will. +- **ADR 0004 amendment** — `apps/claude-code/pr-review/docs/adr/0004-incremental-diff-baseline.md` gets a new "Degraded baseline" subsection (in-place, not a separate ADR) documenting the rule: when `DIFF_RANGE=full`, the Coordinator MAY classify against the full diff but MUST downgrade `addressed` / `obsolete` outputs to `pending` and emit a DEGRADED Notice. Status of ADR 0004 stays `Accepted`; the amendment is additive. +- **CHANGELOG** — `[Unreleased]` Changed entry for the Fetcher result-block extension; Fixed entry for the diff-range fallback no longer being silent. + +End-to-end demoable: invoke `/pr-review:review-pr` against a PR where the prior iteration's commit has been force-pushed away (so the Fetcher's `git fetch origin "$PRIOR_COMMIT_SHA"` fails). The Summary opens with `⚠ diff-range: Incremental diff unavailable — Coordinator will classify against the full PR diff with conservative downgrades.` The Trailer reports `· 1 warning notice`. (Without PRD B's B3 landed, the Coordinator does not yet downgrade — that's B3's verification surface.) + +## Acceptance criteria + +- [ ] `ADO_FETCHER_RESULT_START/END` block emits a `DIFF_RANGE: full | incremental` field. +- [ ] When the diff-range fallback fires, the Fetcher's `NOTICES` array contains a `warning`-severity entry with `kind: diff-range`. +- [ ] When the incremental diff succeeds, `DIFF_RANGE=incremental` and no diff-range Notice is emitted. +- [ ] Orchestrator parses the new field (does not yet consume it — PRD B will). +- [ ] ADR 0004 has the "Degraded baseline" subsection appended in-place. +- [ ] `commands/review-pr.md` is ≤ 200 lines. +- [ ] `pnpm format`, `pnpm check`, `pnpm --filter pr-review test`, `pnpm --filter pr-review verify:changelog` all pass. + +## Blocked by + +`docs/issues/pr-review-ado-fetcher-reliability/01-end-to-end-notice-pipeline.md` + +--- + +## Triage Notes + +> _This was generated by AI during triage._ + +Locked during the `/grill-with-docs` session of 2026-05-13: Q6 (sentinel naming `DIFF_RANGE: full | incremental` chosen over the boolean alternative for forward-compat with future range types; in-place amendment to ADR 0004 rather than a new ADR-0015a — the amendment is additive). Option γ (the γ-downgrade rule) is implemented in PRD B issue B3, not here; A4 only emits the sentinel and Notice. No outstanding questions. diff --git a/docs/issues/pr-review-ado-fetcher-reliability/PRD.md b/docs/issues/pr-review-ado-fetcher-reliability/PRD.md new file mode 100644 index 0000000..8456eff --- /dev/null +++ b/docs/issues/pr-review-ado-fetcher-reliability/PRD.md @@ -0,0 +1,206 @@ +# PRD: pr-review — ADO Fetcher reliability + +**Status:** needs-triage +**Category:** enhancement +**Plugin:** `apps/claude-code/pr-review` + +--- + +## Problem Statement + +When the ADO Fetcher's Azure DevOps reads fail — iterations endpoint down, work-item fetch denied by auth, prior commit missing for an incremental diff — the failures are currently invisible. The bot keeps running and produces output that looks like a normal Review, but signed `Iteration ` (empty), or with `WORK_ITEM_IDS=[]` (indistinguishable from "no work items linked"), or classifying prior threads against the wrong diff range. The reviewer reading the PR has no way to tell that the Review was produced on degraded inputs; the next re-review can be corrupted permanently because Bot Signature drift breaks re-review detection. This is the most consequential class of silent failure surfaced by the PR #29 review. + +## Solution + +Introduce a four-state Notice Tier doctrine across the plugin and apply it first to the ADO Fetcher. Every Fetcher read terminates in one of four tiers — OK, EMPTY-BY-DESIGN, DEGRADED, ABORTED — and emits a structured Notice when the tier is non-OK (with one carve-out: EMPTY-BY-DESIGN is silent except for the Doc Context family). Notices flow from the Fetcher's structured result block through the orchestrator into the Review Summary, where the reviewer sees them. A mandatory end-of-run Trailer line printed in the Claude interface also reports notice counts, so the user invoking the command sees outcome status without opening the PR. + +Failure classification moves to pure JavaScript helpers under `scripts/ado/`, refining the architecture documented in ADR 0013. Three new deep helpers (`classify-http-error`, `notices`, plus per-fetch wrappers) replace the inline bash-and-Node heredocs that today implicitly swallow exit codes. The discriminated-union return shape distinguishes EMPTY-BY-DESIGN from DEGRADED at the helper API level, removing the conflation that today defaults `parseIterations([])` to `{ latestIterationId: 1 }` (silently violating CLAUDE.md's "iterationId=1 is never used" rule). + +The diff-range fallback that the Fetcher already performs (when the prior iteration's commit is unreachable, falling back to the full PR diff) gets a `DIFF_RANGE: full | incremental` sentinel field in `ADO_FETCHER_RESULT` and a DEGRADED Notice. PRD B will consume the sentinel; PRD A only emits it. + +## User Stories + +1. As a PR reviewer, I want a banner at the top of the Review Summary listing any platform failures that occurred during the Review, so that I can tell whether the bot's findings are based on complete or degraded context. +2. As a developer invoking `/pr-review:review-pr`, I want a single end-of-run Trailer line in my Claude interface reporting findings counts, notice counts, and the PR URL, so that I can scan outcome status across many AFK invocations without opening each PR. +3. As a PR reviewer, I want to know when the Review was produced without business context (no work items linked, or work-item fetch failed), so that I can decide whether to re-run with a linked work item or accept the review-without-context. +4. As a Plugin maintainer, I want the Bot Signature to never carry an empty or fabricated Iteration ID, so that re-review detection on the next run is not silently corrupted. +5. As a Plugin maintainer, I want auth or permission failures on the Azure DevOps iterations endpoint to abort the run with a clear stderr message naming `az devops login` as the remedy, so that the user is not left wondering why subsequent re-reviews behave oddly. +6. As a PR reviewer in re-review mode, I want the bot to tell me when it classified prior threads against the full PR diff instead of the incremental diff, so that I can interpret an unexpected `pending` verdict as conservative rather than definitive. +7. As a Plugin maintainer, I want failure classification logic to live in pure JS helpers with unit tests, rather than in bash-and-Node heredocs inside agent prompts, so that I can verify the doctrine is applied consistently without running an end-to-end ADO smoke test. +8. As a developer reading the codebase, I want every ADO write call site to consult one canonical helper that maps HTTP status codes to Notice Tiers, so that 401 means the same thing in every code path and a future contributor cannot accidentally invent a divergent mapping. +9. As a developer running `/pr-review:review-pr` in Pre-PR mode, I want any Fetcher-related infrastructure changes to be invisible to me, because Pre-PR mode does not run the Fetcher. +10. As a Plugin maintainer, I want a Notice that is emitted by multiple agents for the same root cause (e.g. Fetcher and Doc Context Orchestrator both noticing a Confluence outage) to be deduplicated in the orchestrator's merge step, so that the Summary does not list the same problem twice. +11. As a developer maintaining a Re-review on an ADO PR that was merged before the Review completed, I want the Fetcher to still return a usable iteration list (because comments are still useful as a review record), so that the merged-but-reviewable workflow ADR 0013 acknowledges keeps working. +12. As a Plugin maintainer, I want the discriminated-union refactor of `parseIterations` and `parseWorkItemIds` to be purely internal, so that no other plugin or release-tool depends on the old return shape. +13. As a developer running Pre-PR mode, I want the Doc Context EMPTY-BY-DESIGN informational Notice to be emitted only for the Doc Context family (no linked work items), so that other inherently-empty states (first-review having no prior threads, a clean PR having no findings) do not pollute the Summary with redundant `ℹ️` lines. +14. As a Plugin maintainer, I want the ADRs that record the new doctrine (helper-layer split from ADR 0013, canonical HTTP-tier mapping, γ-downgrade rule for diff-range) to be in place before PRD B's consumers start arriving, so that PRD B can reference them rather than re-litigate the decisions. +15. As a CI engineer, I want every new deep helper module to come with `node:test` unit tests in the prior-art style of `packages/release-tools/scripts/verify-changelog.test.mjs`, so that the helpers can be verified without an ADO PR and without Azure CLI installed. + +## Implementation Decisions + +### Notice Tier doctrine + +A four-state classification of every Review operation outcome — **OK**, **EMPTY-BY-DESIGN**, **DEGRADED**, **ABORTED** — captured in `CONTEXT.md` under "Platform-failure handling". The tier choice is the gating decision; there is no fifth ASK tier. AFK invocations never block on user input. Failure modes that tempt an ASK tier are reclassified as ABORTED. + +EMPTY-BY-DESIGN is silent for most states. The Doc Context family is the one exception: when `WORK_ITEM_IDS=[]` the orchestrator emits an `info`-severity Notice in the Summary, because the reviewer cannot tell from the PR alone whether the bot considered linked business context. + +### Notice flow + +Each orchestration agent emits a `NOTICES` JSON array as a new field in its structured result block. The orchestrator parses, merges (with `kind`-based deduplication), and passes the merged array to the ADO Writer alongside `FINDINGS`. The ADO Writer renders a `## Notices` block above the findings in the Review Summary content. The heading stays bare (no emoji) so a mixed `info` + `warning` Notices list does not require the heading emoji to misrepresent one of the tiers; each list item carries its own per-Notice emoji prefix (`ℹ️` for `info`, `⚠` for `warning`). + +Notice shape: `{ severity: "info" | "warning", kind: , message: string }`. `kind` is a small enum (`doc-context`, `diff-range`, `work-items`, `iterations`, `default-branch`, `partial-run-check`, `thread-match`, `thread-classify`, `inline-post`, `summary-post`, `patch-to-fixed`, `diff-parse`); rejected: free-form strings, severity-coded numerics. ABORTED never reaches the Notice channel — its surface is stderr + the Trailer. + +### End-of-run Trailer + +The orchestrator prints a mandatory single-line Trailer to the Claude interface at end-of-run, regardless of mode or outcome: + +- ADO modes: `✅ Review posted: findings ( critical, important) · warning notices · info notices → ` +- Pre-PR mode: `✅ Pre-PR review complete: findings ( critical, important) · warning notices` +- Aborted: `❌ Review aborted: ` + +Designed for AFK skim: the invoker sees outcome status without opening the PR. Same `NOTICES` array drives both the Summary rendering and the Trailer counts. + +### Helper layer (ADR 0014) + +Failure classification moves from inline bash-and-Node heredocs to pure JS helpers under `scripts/ado/`. Agent prompts shrink to "import, call, branch on `result.ok`". This refines ADR 0013 — orchestration still lives in agent prompts, but **failure classification** lives in helpers. + +New helper modules: + +- **`scripts/ado/classify-http-error.mjs`** — pure function taking an HTTP status code, response body excerpt, and process exit code. Returns `{ tier: 'ok' | 'degraded' | 'aborted', kind, message }`. Encodes the canonical HTTP-tier mapping. Consumed by PRD B too. +- **`scripts/ado/notices.mjs`** — pure helpers `createNotice`, `mergeNotices` (dedupe by `kind`), `formatNoticesAsSummaryBlock`, `formatNoticesAsPrePrPreamble`, `formatTrailer`. +- **`scripts/ado/fetch-iterations.mjs`** — wraps the iterations fetch and parse; returns `{ ok: true, latestIterationId, latestCommitSha } | { ok: false, reason }`. Subsumes the existing `parseIterations` helper, refactored to the discriminated-union shape. Empty `value` array on a real PR → `{ ok: false, reason: 'empty-iterations' }` → ABORTED. +- **`scripts/ado/fetch-work-items.mjs`** — wraps the work-items fetch and parse; returns `{ ok: true, ids } | { ok: false, reason }`. Subsumes `parseWorkItemIds`. Empty array (legitimate "no work items linked") → `{ ok: true, ids: [] }`; fetch failure (auth, 5xx, network) → `{ ok: false }`. + +### Canonical HTTP-tier mapping (ADR 0015) + +| HTTP outcome | Tier | Notes | +| --------------------- | -------- | ---------------------------------------------------------- | +| 200 / 201 | OK | No Notice. | +| 404 | OK | Domain "the thing is already gone." | +| 409 | OK | Domain "state already changed." | +| 401 | ABORTED | Token expired or revoked; all subsequent writes will fail. | +| 403 | ABORTED | Permission revoked; same. | +| 5xx | DEGRADED | Transient backend; emit Notice; continue. | +| Other 4xx (400 / 422) | DEGRADED | Malformed request bug; Notice includes body excerpt. | +| Network error | DEGRADED | Treat as 5xx. | + +No retries in v1. Retries add latency, complexity, and a new failure mode (retry storm). The doctrine produces correct behaviour without them; retries can be added later behind the same Notice surface. + +### DIFF_RANGE sentinel and ADR 0004 amendment + +The ADO Fetcher's existing fallback from incremental to full diff (when the prior iteration's commit is unreachable) is currently silent. PRD A introduces: + +- A new `DIFF_RANGE: full | incremental` line in `ADO_FETCHER_RESULT_START/END`. +- A DEGRADED Notice (`kind: diff-range`, message: "Incremental diff unavailable — Coordinator will classify against the full PR diff with conservative downgrades.") when the fallback fires. + +The PRD B Coordinator changes (γ-downgrade rule that remaps `addressed` / `obsolete` to `pending` when `DIFF_RANGE=full`) consume this sentinel. PRD A only emits it. + +ADR 0004 ("incremental diff baseline") is amended in-place with a "Degraded baseline" subsection covering this rule. + +### Agent and orchestrator changes + +- **`.agents/ado-fetcher.md`** — three inline bash heredocs (Steps 2, 4a/work-items, 4-diff) replaced with `await import` calls to the three new helpers. `ADO_FETCHER_RESULT` output block grows two fields: `DIFF_RANGE` and `NOTICES`. +- **`.agents/ado-writer.md`** — accepts a new `NOTICES` input; renders the `## Notices` block above the existing severity-grouped findings in the Summary content. No changes to write call sites in PRD A (those land in PRD B). +- **`commands/review-pr.md`** — parses `NOTICES` and `DIFF_RANGE` from `ADO_FETCHER_RESULT`; merges Notices via the `notices` helper; passes merged Notices to the ADO Writer prompt; emits Doc-Context EMPTY-BY-DESIGN info Notice when `WORK_ITEM_IDS=[]`; prints the mandatory end-of-run Trailer line. The 200-line cap from PRD-orchestrator-split is preserved by leaning on the new helpers (the bash side becomes uniform `if [ "$RESULT_OK" != "true" ]; then ...`). + +### Existing helpers, breaking-change check + +`parseIterations`, `parseWorkItemIds`, and any other affected helpers are verified to have zero consumers outside the `pr-review` plugin (`grep` across `apps/`, `packages/`, `docs/` returns no matches outside `apps/claude-code/pr-review/`). The discriminated-union refactor is therefore safe to land without a deprecation period. + +## Testing Decisions + +### What makes a good test + +Tests assert the external behaviour of each helper given controlled inputs — no implementation-detail inspection, no internal-branching tests. Inputs are plain JavaScript objects or short JSON fixtures. A test reads as a sentence: "given an HTTP 401, classifyHttpError returns the aborted tier." `node:test` built-in, `node:assert/strict`, no external deps. + +### Modules under test + +**New deep helpers (full unit-test coverage):** + +- `scripts/ado/classify-http-error.mjs` — one test per row of the canonical mapping (200, 201, 401, 403, 404, 409, 5xx, 400, 422, network/exit-code paths) plus the case where the body excerpt is malformed JSON. +- `scripts/ado/notices.mjs` — `createNotice` shape, `mergeNotices` dedup behaviour across multiple sources, the three `format…` renderers producing expected markdown / line shapes, `formatTrailer` for first-review / re-review / pre-pr / aborted modes. +- `scripts/ado/fetch-iterations.mjs` — happy path with one iteration, multiple iterations (returns the max), empty `value` array → `{ ok: false, reason: 'empty-iterations' }`, missing `value` key, malformed JSON, an ADO error response. +- `scripts/ado/fetch-work-items.mjs` — empty PR-work-item links → `{ ok: true, ids: [] }`, populated links, dedup of duplicate IDs (existing parseWorkItemIds invariant), null/missing response → `{ ok: false }`, ADO error response. + +The existing test files for `parseIterations` and `parseWorkItemIds` are subsumed — the fetch helpers replace them and inherit their fixtures. + +### Modules NOT under test in PRD A + +- Agent prompt content (`.agents/*.md`, `commands/review-pr.md`): no new string-match assertions. The existing pattern is flagged as brittle in `docs/inbox/pr-review-prompt-content-tests-brittleness.md` and behaviour is verified by integration smoke test against a real ADO PR after merge, per ADR 0013's testing posture. + +### Prior art + +`packages/release-tools/scripts/verify-changelog.test.mjs`, `packages/release-tools/scripts/bump-version.test.mjs`, and `apps/claude-code/pr-review/tests/parse-diff-hunks.test.mjs` (added in PR #29). Same style throughout. + +## Out of Scope + +- Coordinator and Writer changes (DIFF_RANGE consumption, γ-downgrade rule applied, HTTP-tier mapping applied to every write call site, `*.err` retention policy, `parseAdoWriterResult` discriminated-union refactor) — those land in **PRD B**. +- Pre-PR mode changes (`parseChangedFilesFromDiff` suspicious-shape Notice, default-branch fallback chain + Notice) — those land in **PRD B**. +- The integration smoke test against a real ADO PR — verification is manual, post-merge. +- Retries on transient HTTP errors — out of scope per the doctrine. Re-evaluate if 5xx Notices prove painful in practice. +- A canonical thread shape spanning ADO and GitHub — deferred per ADR 0013 until a second platform consumer exists. +- Lifting any helper from `scripts/ado/` to `pr-review-toolkit` — none of these helpers are platform-shared yet. + +## Further Notes + +**ADR 0014** (`apps/claude-code/pr-review/docs/adr/0014-failure-classification-helpers.md`) records the helper-layer refinement to ADR 0013. + +**ADR 0015** (`apps/claude-code/pr-review/docs/adr/0015-canonical-http-tier-mapping.md`) records the HTTP-tier mapping, the 401/403 abort rule, and the no-retries-in-v1 stance. + +**ADR 0004** (`apps/claude-code/pr-review/docs/adr/0004-incremental-diff-baseline.md`) is amended in-place with a "Degraded baseline" subsection covering the γ-downgrade rule that PRD B will implement on the consumer side. + +**`CONTEXT.md`** is already updated with the new terms (Notice, Notice Tier and its four states, Trailer). + +**Source:** the deferred items from the PR #29 multi-agent review, grilled against the domain doctrine over the conversation captured in this session. The originating inbox file (`docs/inbox/pr-review-ado-error-hardening-pass.md`) is removed once PRD A and PRD B are published. + +--- + +## Agent Brief + +> _This was generated by AI during triage._ + +**Category:** enhancement +**Summary:** Apply the four-tier Notice doctrine to the ADO Fetcher. Introduce three new deep helpers in `scripts/ado/` (`classify-http-error`, `notices`, plus `fetch-iterations` and `fetch-work-items` as discriminated-union refactors of the existing parsers). The Fetcher emits a `NOTICES` array and a `DIFF_RANGE` sentinel in its structured result block; the orchestrator merges Notices, passes them to the ADO Writer, prints a mandatory end-of-run Trailer line. The ADO Writer renders a `## Notices` block above findings in the Review Summary. + +**Current behavior:** +ADO Fetcher reads silently swallow exit codes. An iterations-fetch failure produces `LATEST_ITERATION_ID=''`, drifting the Bot Signature to `Iteration ` (empty) and breaking re-review detection forever afterward. A work-item-fetch failure is indistinguishable from "no work items linked" — both produce `WORK_ITEM_IDS=[]`. A diff-range fallback to the full PR diff happens silently, causing the Coordinator to classify prior threads against the wrong range. None of these surface to the reviewer or the invoker. + +**Desired behavior:** +Every Fetcher operation terminates in one of four Notice Tiers (OK, EMPTY-BY-DESIGN, DEGRADED, ABORTED). Tier choice is the gating decision — no user prompts, AFK-friendly. Failures route to: + +- **ABORTED** for state-corrupting failures (empty iterations on a real PR, 401/403 on iteration fetch). Process exits non-zero with stderr message + Trailer aborted line. +- **DEGRADED** for failures the Review can still complete around (work-item fetch failed, diff-range fallback to full diff, 5xx on any read). Emits a `warning` Notice surfaced in the Review Summary. +- **EMPTY-BY-DESIGN** for legitimate empty states. Silent except for the Doc Context family (`WORK_ITEM_IDS=[]` → `info` Notice in the Summary). +- **OK** for normal completion. + +The four new helpers under `scripts/ado/` own the classification logic. Agent prompts shrink to `await import` + branch on `result.ok`. The Bot Signature is never signed with an empty Iteration ID again. + +**Key interfaces:** + +- `classifyHttpError({ status, body, exitCode }) → { tier, kind, message }` — canonical HTTP-tier mapping; consumed by PRD B. +- `createNotice / mergeNotices / formatNoticesAsSummaryBlock / formatNoticesAsPrePrPreamble / formatTrailer` from `scripts/ado/notices.mjs`. +- `fetchIterations(...) → { ok: true, latestIterationId, latestCommitSha } | { ok: false, reason }`. +- `fetchWorkItems(...) → { ok: true, ids } | { ok: false, reason }`. +- `ADO_FETCHER_RESULT` grows `DIFF_RANGE: full | incremental` and `NOTICES: [{severity, kind, message}, …]` fields. + +**Acceptance criteria:** + +- [ ] The four new helpers under `scripts/ado/` exist and pass their unit tests (`pnpm --filter pr-review test`). +- [ ] `parseIterations` / `parseWorkItemIds` are gone — the new fetch helpers fully subsume them; no consumer outside `pr-review` is broken (verified by `grep`). +- [ ] An iterations fetch that returns `value: []` aborts the run with a clear stderr message and a Trailer `❌ Review aborted: empty-iterations — …` line. +- [ ] A work-item fetch that fails with auth/5xx/network emits a DEGRADED Notice (`kind: work-items`); a work-item fetch that returns an empty list emits an `info` Notice (`kind: doc-context`). +- [ ] A diff-range fallback emits `DIFF_RANGE: full` in `ADO_FETCHER_RESULT` and a DEGRADED Notice (`kind: diff-range`). +- [ ] The orchestrator merges Notices, dedupes by `kind`, and passes them to the ADO Writer. +- [ ] The ADO Writer renders a `## Notices` block above findings in first-review and re-review Summaries. +- [ ] Every successful run ends with a Trailer line in the Claude interface listing findings, notices, and the PR URL (ADO modes) or finding counts (Pre-PR mode). +- [ ] `commands/review-pr.md` remains ≤ 200 lines. +- [ ] ADR 0014 (helper layer), ADR 0015 (HTTP-tier mapping), and the in-place ADR 0004 amendment exist. +- [ ] `pnpm test` passes; `pnpm format` produces no diff; `pnpm check` reports zero warnings. + +**Out of scope:** + +- Coordinator and Writer changes — PRD B. +- Pre-PR mode changes — PRD B. +- Retries on transient HTTP errors. +- Integration smoke test (manual, post-merge). +- Lifting helpers to `pr-review-toolkit`. diff --git a/docs/issues/pr-review-platform-failure-handling/01-writer-http-tier-mapping.md b/docs/issues/pr-review-platform-failure-handling/01-writer-http-tier-mapping.md new file mode 100644 index 0000000..881c51e --- /dev/null +++ b/docs/issues/pr-review-platform-failure-handling/01-writer-http-tier-mapping.md @@ -0,0 +1,57 @@ +# B1. `parse-write-response` helper + ADO Writer applies HTTP-tier mapping to all writes + `*.err` streaming + +**Status:** ready-for-agent +**Category:** enhancement +**Plugin:** `apps/claude-code/pr-review` +**Type:** AFK + +## Parent + +`docs/issues/pr-review-platform-failure-handling/PRD.md` + +## What to build + +Route every ADO write call site in the ADO Writer through one canonical helper. Apply the HTTP-tier mapping consistently. Fix the H1 retroactive auth gap and the `*.err` retention policy. + +Implementation cuts through every layer: + +- **New helper** `scripts/ado/parse-write-response.mjs` — pure function `({ httpExit, responseText, errStream }) → { ok: true, id } | { ok: false, tier, kind, message }`. Composes `classify-http-error` (from A2) with response-`id` parsing. Used by every ADO write call site. With unit tests covering happy path, 200/201 with valid `id`, 401, 403, 404, 409, 5xx, network exit-code, malformed JSON body, missing `id` field on 200 response. +- **ADO Writer prompt** — every `az devops invoke` POST/PATCH call site routed through the new helper: + - inline POST (Step 1) — including the threadContext-fallback path + - summary POST (Step 2 first-review) + - delta reply POST (Step 2 re-review) + - completion marker POST (Step 3) +- **Tier handling per call site:** + - `ok: true` → record the `id`, increment counters, continue (today's H1 behaviour, now formalised through the helper). + - `ok: false, tier: 'aborted'` (401/403) → emit stderr message ("ERROR: . Try `az devops login` to re-authenticate.") and exit non-zero. The orchestrator surfaces the abort in the Trailer. + - `ok: false, tier: 'degraded'` (5xx/network/4xx) → push a Notice (`kind: inline-post | summary-post | patch-to-fixed`-equivalent for delta/completion marker) to the Writer's `NOTICES` array, continue to next call site. +- **`*.err` retention policy** — at the moment of failure, stream the contents of the per-call `*.err` file to the Writer's stderr (so the failure text is adjacent to the Notice that references it). Cleanup step at the end of the Writer is unconditional — no retention based on counts. +- **Writer result block** — `ADO_WRITER_RESULT_START/END` gains a `NOTICES: [...]` array so the orchestrator can merge Writer-emitted notices with Fetcher-emitted notices for the Summary. +- **Orchestrator** — merges the Writer's `NOTICES` into the combined array passed to subsequent rendering steps if needed; the Trailer notice counts already reflect all merged notices per A1. +- **CHANGELOG** — `[Unreleased]` Added entry for `parse-write-response.mjs`; Changed entries for the Writer call sites; Fixed entry retroactively covering the H1 inline-POST auth gap and the `*.err` streaming policy. + +End-to-end demoable: invoke `/pr-review:review-pr` against a PR while the local `az devops login` token is revoked. The Claude interface ends with `❌ Review aborted: auth — ` after the first failing write. Restore auth, simulate a 5xx (e.g. malformed REPO_ID), and the Summary renders `## Notices` with `⚠ inline-post: Failed to post inline comment at /src/foo.ts:42 (HTTP 503).` plus the `*.err` content visible in stderr above the Notice. + +## Acceptance criteria + +- [ ] `scripts/ado/parse-write-response.mjs` exists with full unit-test coverage (≥ 10 cases). +- [ ] Every `az devops invoke` POST/PATCH in `.agents/ado-writer.md` routes through the new helper. +- [ ] 401 or 403 from any write call aborts the Writer with a clear stderr message; the orchestrator's Trailer line reads `❌ Review aborted: auth — ...`. +- [ ] 5xx, network, and other-4xx from any write call emits a DEGRADED Notice; the Writer continues to the next call site. +- [ ] `*.err` file content is streamed to stderr at the moment of failure; cleanup at the end is unconditional. +- [ ] `ADO_WRITER_RESULT_START/END` emits a `NOTICES` array. +- [ ] The H1 inline-POST path (from PR #29) inherits the canonical mapping — auth failures no longer log-and-continue. +- [ ] `commands/review-pr.md` is ≤ 200 lines. +- [ ] `pnpm format`, `pnpm check`, `pnpm --filter pr-review test`, `pnpm --filter pr-review verify:changelog` all pass. + +## Blocked by + +`docs/issues/pr-review-ado-fetcher-reliability/02-classify-http-error-and-work-items.md` + +--- + +## Triage Notes + +> _This was generated by AI during triage._ + +Locked during the `/grill-with-docs` session of 2026-05-13: Q7 canonical HTTP-tier mapping applied to every Writer call site; Q8(b) `*.err` retention policy (stream-to-stderr at moment of failure, unconditional cleanup, rejected the conditional-retention alternative). H1's per-finding LOG-AND-CONTINUE pattern (from PR #29) is preserved for the per-thread cases but extended with the 401/403 → ABORT escalation — the inbox flagged this consistency gap and grilling confirmed the escalation. No outstanding questions. diff --git a/docs/issues/pr-review-platform-failure-handling/02-parse-ado-writer-result-discriminated-union.md b/docs/issues/pr-review-platform-failure-handling/02-parse-ado-writer-result-discriminated-union.md new file mode 100644 index 0000000..fded2f3 --- /dev/null +++ b/docs/issues/pr-review-platform-failure-handling/02-parse-ado-writer-result-discriminated-union.md @@ -0,0 +1,46 @@ +# B2. `parseAdoWriterResult` discriminated-union refactor + +**Status:** ready-for-agent +**Category:** enhancement +**Plugin:** `apps/claude-code/pr-review` +**Type:** AFK + +## Parent + +`docs/issues/pr-review-platform-failure-handling/PRD.md` + +## What to build + +Refactor `parseAdoWriterResult` to a discriminated-union return shape so the orchestrator can distinguish "Writer crashed before printing its result block" from "Writer parsed successfully with zero findings posted." + +Implementation cuts through every layer: + +- **Refactor** `scripts/ado-writer.mjs` (`parseAdoWriterResult`) — return type becomes `{ ok: true, summaryThreadId, findingsPosted } | { ok: false, reason: 'missing-block' | 'malformed' }`. Today's `null` return is subsumed: missing `ADO_WRITER_RESULT_START` / `_END` → `{ ok: false, reason: 'missing-block' }`; present block with garbage inside → `{ ok: false, reason: 'malformed' }`; valid block → `{ ok: true, ... }`. +- **Update existing tests** — the existing `ado-writer.test.mjs` cases that asserted `null` are updated to the new return shape. Add cases for the two `ok: false` reasons explicitly. +- **Orchestrator** — `commands/review-pr.md` parses the Writer's result block via the refactored helper. On `{ ok: false }`, the orchestrator emits a stderr abort message ("ERROR: Writer did not return a valid result block (). The Summary may or may not have been posted; verify on ADO.") and prints a Trailer aborted line. On `{ ok: true }`, behaviour unchanged. +- **ADO Writer prompt** — the existing round-trip validation step (added in PR #29's H2 fix) is updated to assert against the new `{ ok: true }` shape rather than the old "non-null" shape. +- **CHANGELOG** — `[Unreleased]` Changed entry for the helper API; Fixed entry covering the silent-success bug on Writer crash. + +End-to-end demoable: inject a fault that crashes the Writer mid-Summary-post (e.g. corrupt `LATEST_ITERATION_ID` env var). The Claude interface ends with `❌ Review aborted: writer-missing-block — Writer did not return a valid result block. The Summary may or may not have been posted; verify on ADO.` instead of the misleading success Trailer that would have appeared with the old `null` behaviour. + +## Acceptance criteria + +- [ ] `parseAdoWriterResult` returns the discriminated-union shape; existing test file's `null` assertions are migrated to `{ ok: false }` assertions. +- [ ] At least two new test cases cover `reason: 'missing-block'` and `reason: 'malformed'`. +- [ ] Orchestrator branches on `result.ok` and emits an abort message + Trailer aborted line on `{ ok: false }`. +- [ ] ADO Writer prompt's round-trip validation step works against the new shape. +- [ ] No code path in the orchestrator relies on the old `null` return. +- [ ] `commands/review-pr.md` is ≤ 200 lines. +- [ ] `pnpm format`, `pnpm check`, `pnpm --filter pr-review test`, `pnpm --filter pr-review verify:changelog` all pass. + +## Blocked by + +`docs/issues/pr-review-ado-fetcher-reliability/01-end-to-end-notice-pipeline.md` + +--- + +## Triage Notes + +> _This was generated by AI during triage._ + +Locked during the `/grill-with-docs` session of 2026-05-13: Q8 (mechanical batch — discriminated-union refactor distinguishes Writer-crash from zero-success; `null` return was conflating the two cases). Verified breaking-change-free: zero consumers outside `apps/claude-code/pr-review/`. No outstanding questions. diff --git a/docs/issues/pr-review-platform-failure-handling/03-coordinator-diff-range-gamma-downgrade.md b/docs/issues/pr-review-platform-failure-handling/03-coordinator-diff-range-gamma-downgrade.md new file mode 100644 index 0000000..cd712ac --- /dev/null +++ b/docs/issues/pr-review-platform-failure-handling/03-coordinator-diff-range-gamma-downgrade.md @@ -0,0 +1,46 @@ +# B3. Coordinator consumes `DIFF_RANGE` → γ-downgrade in `classify-thread` + +**Status:** ready-for-agent +**Category:** enhancement +**Plugin:** `apps/claude-code/pr-review` +**Type:** AFK + +## Parent + +`docs/issues/pr-review-platform-failure-handling/PRD.md` + +## What to build + +Extend `classify-thread` with the γ-downgrade rule and have the Re-review Coordinator pass the `DIFF_RANGE` sentinel (emitted by A4) into every thread classification call. + +Implementation cuts through every layer: + +- **`scripts/re-review/classify-thread.mjs`** — adds a `diffRange: 'full' | 'incremental'` parameter (default `'incremental'`, preserving today's behaviour). When `diffRange === 'full'`, the function remaps `addressed` → `pending` and `obsolete` → `pending`; `disputed` is unaffected (its derivation is reviewer-reply-based, not diff-position-based). The downgrade is a single new branch at the end of the existing classification flow. +- **Existing tests** — `scripts/re-review/classify-thread.test.mjs` gets new cases (the user-confirmed test scope for PRD B is "NEW deep modules only", but this is a behaviour change to a MODIFY module that ships with the slice — the new cases are minimal additions, not full new test files). +- **Re-review Coordinator prompt** — parses `DIFF_RANGE` from `ADO_FETCHER_RESULT` (which A4 already emits). Threads the value into every `classify-thread` invocation in Step 5 of the Coordinator. The Notice surfacing the downgrade is already emitted by the Fetcher in A4; the Coordinator does not emit a duplicate. +- **CHANGELOG** — `[Unreleased]` Changed entry for the classify-thread parameter; Fixed entry covering the previously-silent classification against a full-diff fallback. + +End-to-end demoable: trigger A4's diff-range fallback (force-push away the prior iteration's commit on a re-review). The Summary opens with `⚠ diff-range: Incremental diff unavailable...` (emitted by A4), and the thread classifications visibly downgrade — what would have been `addressed` or `obsolete` is now `pending`. The reviewer sees one Notice + one consistently-conservative classification, instead of false-confidence verdicts. + +## Acceptance criteria + +- [ ] `classify-thread` accepts a `diffRange` parameter; default is `'incremental'`. +- [ ] When `diffRange === 'full'`, outputs `addressed` and `obsolete` are remapped to `pending`; `disputed` is unaffected. +- [ ] At least two new test cases in `classify-thread.test.mjs` cover the downgrade branches. +- [ ] Re-review Coordinator parses `DIFF_RANGE` from `ADO_FETCHER_RESULT` and passes it to every classify-thread call. +- [ ] On a synthetic full-diff fallback, no thread is classified as `addressed` or `obsolete` purely from diff position. +- [ ] No duplicate diff-range Notice is emitted by the Coordinator (the Fetcher's Notice from A4 is the only one). +- [ ] `commands/review-pr.md` is ≤ 200 lines. +- [ ] `pnpm format`, `pnpm check`, `pnpm --filter pr-review test`, `pnpm --filter pr-review verify:changelog` all pass. + +## Blocked by + +`docs/issues/pr-review-ado-fetcher-reliability/04-diff-range-sentinel.md` + +--- + +## Triage Notes + +> _This was generated by AI during triage._ + +Locked during the `/grill-with-docs` session of 2026-05-13: Q6 Option γ — when `DIFF_RANGE=full`, `addressed` and `obsolete` outputs from `classify-thread` are remapped to `pending`; `disputed` is unaffected (its derivation is reviewer-reply-based, not diff-position-based). Option α (silent continuation) and Option β (skip classification entirely) were both rejected — γ preserves classifications the Coordinator can still make confidently while defaulting diff-position-derived verdicts to the safer state. No outstanding questions. diff --git a/docs/issues/pr-review-platform-failure-handling/04-coordinator-match-finding-throw.md b/docs/issues/pr-review-platform-failure-handling/04-coordinator-match-finding-throw.md new file mode 100644 index 0000000..edea2be --- /dev/null +++ b/docs/issues/pr-review-platform-failure-handling/04-coordinator-match-finding-throw.md @@ -0,0 +1,47 @@ +# B4. Coordinator `match-finding` throws + DEGRADED Notice on catch + +**Status:** ready-for-agent +**Category:** enhancement +**Plugin:** `apps/claude-code/pr-review` +**Type:** AFK + +## Parent + +`docs/issues/pr-review-platform-failure-handling/PRD.md` + +## What to build + +Change `match-finding` to throw on parse errors (distinguishable from `null` = legitimate no-match), and have the Re-review Coordinator catch the throw and surface a DEGRADED Notice instead of silently treating it as no-match (which today causes the same finding to be re-posted as a duplicate inline thread). + +Implementation cuts through every layer: + +- **`scripts/re-review/match-finding.mjs`** — today returns `null` on no match. New contract: `null` continues to mean "legitimate no-match"; a thrown `Error` distinguishes a parse failure in the input. Internal `JSON.parse` calls and helper-call boundaries that previously swallowed errors now propagate them. +- **Existing tests** — `scripts/re-review/match-finding.test.mjs` gets new cases asserting that malformed input throws (rather than returns `null`). Existing "no match → null" cases unchanged. +- **Re-review Coordinator prompt** — Step 6a (per-finding match) wraps the `match-finding` call in try/catch. On throw, append a DEGRADED Notice to the Coordinator's `NOTICES` array (`kind: thread-match`, message: "Could not classify finding at : — falling back to no-match.") and let the finding fall through naturally to the unclassified path (which today causes duplicate posting — but now with the Notice surfacing the cause to the reviewer). +- **Coordinator result block** — `RE_REVIEW_COORDINATOR_RESULT_START/END` gains a `NOTICES: [...]` field (similar to the Fetcher's and the Writer's). The orchestrator merges Coordinator notices with Fetcher and Writer notices into the combined Summary block. +- **CHANGELOG** — `[Unreleased]` Changed entry for match-finding's new throw semantics; Fixed entry covering the silent-duplicate-posting bug. + +End-to-end demoable: inject a synthetic match-finding parse failure (e.g. corrupt the prior threads JSON for one specific thread). The reviewer sees the finding re-posted as a new inline thread (today's behaviour) AND a `⚠ thread-match: Could not classify finding at /src/foo.ts:42 — falling back to no-match.` Notice in the Summary, instead of just the silent duplicate. + +## Acceptance criteria + +- [ ] `match-finding` throws on input parse errors; returns `null` only for legitimate no-match. +- [ ] At least two new test cases cover the throw paths. +- [ ] Coordinator wraps the per-finding match call in try/catch and emits a DEGRADED Notice (`kind: thread-match`) on throw. +- [ ] Coordinator's result block emits a `NOTICES` array. +- [ ] Orchestrator merges Coordinator notices into the combined Notice block. +- [ ] On a synthetic match-finding parse failure, both the duplicate posting and the Notice appear in the Summary. +- [ ] `commands/review-pr.md` is ≤ 200 lines. +- [ ] `pnpm format`, `pnpm check`, `pnpm --filter pr-review test`, `pnpm --filter pr-review verify:changelog` all pass. + +## Blocked by + +`docs/issues/pr-review-ado-fetcher-reliability/01-end-to-end-notice-pipeline.md` + +--- + +## Triage Notes + +> _This was generated by AI during triage._ + +Locked during the `/grill-with-docs` session of 2026-05-13: Q8(a) — throw-on-parse-error in `match-finding`; try/catch in the Coordinator; on throw the finding falls through to natural duplicate posting but with a DEGRADED Notice (`kind: thread-match`) surfacing the cause. Aborting the whole Coordinator was considered and rejected because parse errors are local to one finding-thread pair. No outstanding questions. diff --git a/docs/issues/pr-review-platform-failure-handling/05-coordinator-patch-to-fixed-mapping.md b/docs/issues/pr-review-platform-failure-handling/05-coordinator-patch-to-fixed-mapping.md new file mode 100644 index 0000000..e917d0a --- /dev/null +++ b/docs/issues/pr-review-platform-failure-handling/05-coordinator-patch-to-fixed-mapping.md @@ -0,0 +1,48 @@ +# B5. Coordinator PATCH-to-fixed routed through `parse-write-response` + +**Status:** ready-for-agent +**Category:** enhancement +**Plugin:** `apps/claude-code/pr-review` +**Type:** AFK + +## Parent + +`docs/issues/pr-review-platform-failure-handling/PRD.md` + +## What to build + +Apply the canonical HTTP-tier mapping to the Re-review Coordinator's PATCH-to-fixed call site, replacing the existing 409-only catch-all with the same uniform error handling every other ADO write surface now uses. + +Implementation cuts through every layer: + +- **Re-review Coordinator prompt** — Step 5's PATCH-to-fixed call site (where the Coordinator marks an `addressed` thread as fixed by PATCHing its `status`) is refactored to capture exit code, response body, and stderr, then route them through `parse-write-response.mjs` (from B1). +- **Tier handling for PATCH-to-fixed:** + - `ok: true` → thread successfully marked fixed; continue. + - `ok: false, tier: 'aborted'` (401/403) → emit stderr message and exit the Coordinator non-zero. The orchestrator surfaces the abort in the Trailer. + - `ok: false, tier: 'degraded'` (5xx/network/other-4xx) → push a per-thread Notice (`kind: patch-to-fixed`, message: "Could not mark thread as fixed (HTTP ). Thread remains active and will be re-evaluated on next re-review.") to the Coordinator's `NOTICES` array, continue to the next thread. +- **Special-cases preserved by the canonical mapping** — 404 (thread deleted) and 409 (state already changed) both map to `ok: true` in `classify-http-error`, so the Coordinator continues silently for those, matching today's behaviour and the user's intent. +- **CHANGELOG** — `[Unreleased]` Changed entry for the Coordinator's PATCH-to-fixed call site; Fixed entry covering the silent-failure auth gap (401/403 used to be a "PATCH warning" string on stdout that nothing read). + +End-to-end demoable: run a re-review against a PR whose threads include at least one `addressed` candidate, while the local `az devops login` is revoked. The Claude interface ends with the Trailer aborted line naming the auth failure ("Could not mark thread N as fixed (HTTP 401). Try `az devops login` to re-authenticate."). Restore auth and simulate a 5xx (e.g. throttling), and the Summary's `## Notices` block contains an entry like "⚠ patch-to-fixed: Could not mark thread N as fixed (HTTP 503). Thread remains active and will be re-evaluated on next re-review." — and the rest of the re-review completes normally. + +## Acceptance criteria + +- [ ] Coordinator's PATCH-to-fixed call routes through `parse-write-response.mjs`. +- [ ] 401 or 403 from any PATCH-to-fixed aborts the Coordinator with a clear stderr message; the orchestrator's Trailer line reads `❌ Review aborted: auth — ...`. +- [ ] 5xx / network / other-4xx from any PATCH-to-fixed emits a per-thread DEGRADED Notice; the Coordinator continues to the next thread. +- [ ] 404 and 409 continue silently (canonical OK tier). +- [ ] The old 409-only catch-all is removed from the Coordinator prompt. +- [ ] `commands/review-pr.md` is ≤ 200 lines. +- [ ] `pnpm format`, `pnpm check`, `pnpm --filter pr-review test`, `pnpm --filter pr-review verify:changelog` all pass. + +## Blocked by + +`docs/issues/pr-review-platform-failure-handling/01-writer-http-tier-mapping.md` + +--- + +## Triage Notes + +> _This was generated by AI during triage._ + +Locked during the `/grill-with-docs` session of 2026-05-13: Q7 canonical HTTP-tier mapping applied to the Coordinator's PATCH-to-fixed (the only HTTP write surface in the Coordinator). 404 and 409 stay OK per the canonical mapping (today's 409-only catch-all behaviour is preserved; 404 is now also OK because a deleted thread is a domain success). 401/403 abort the Coordinator with the same stderr+Trailer contract as the Writer. No outstanding questions. diff --git a/docs/issues/pr-review-platform-failure-handling/06-pre-pr-notice-surface.md b/docs/issues/pr-review-platform-failure-handling/06-pre-pr-notice-surface.md new file mode 100644 index 0000000..f18cc1e --- /dev/null +++ b/docs/issues/pr-review-platform-failure-handling/06-pre-pr-notice-surface.md @@ -0,0 +1,48 @@ +# B6. Pre-PR Notice surface: suspicious-shape Notice + Gitflow-aware default-branch fallback + +**Status:** ready-for-agent +**Category:** enhancement +**Plugin:** `apps/claude-code/pr-review` +**Type:** AFK + +## Parent + +`docs/issues/pr-review-platform-failure-handling/PRD.md` + +## What to build + +Give Pre-PR mode the same Notice surface as ADO modes. Detect malformed diff inputs. Replace the hardcoded `main` fallback with a Gitflow-aware fallback chain. + +Implementation cuts through two helpers and the orchestrator's Pre-PR steps: + +- **`scripts/pre-pr.mjs` (`buildPrePrContext` + `parseChangedFilesFromDiff`)** — return shape of `buildPrePrContext` extended to `{ rawDiff, changedFiles, filteredFiles, notices: Notice[] }`. `parseChangedFilesFromDiff` detects suspicious shape: non-empty input that contains ≥ 1 `diff --git` header but produces zero parsed paths. When detected, `buildPrePrContext` pushes a DEGRADED Notice (`kind: diff-parse`, message: "Pre-PR diff parsed to zero files but contained diff headers — input may be malformed.") to the returned `notices` array. Existing test file extended with cases covering the suspicious-shape detection. +- **New helper** `scripts/pre-pr/detect-default-branch.mjs` — pure function `({ branchExists, remoteHeadBranch }) → { branch: string | null, source: 'remote-show' | 'develop-fallback' | 'main-fallback' | 'master-fallback' | 'none', notice?: Notice }`. The function tries (in order): the `remoteHeadBranch` argument (parsed by the bash side from the `HEAD branch:` line of `git remote show origin` output) if non-empty; then `origin/develop`; then `origin/main`; then `origin/master`. Returns `{ branch: null, source: 'none' }` when nothing exists. Emits a `warning` Notice (`kind: default-branch`, message: `"Default branch not detected via remote-show; computed diff against origin/ ()."`) when any fallback level fires. With unit tests covering each branch. +- **`commands/review-pr.md` (Pre-PR Step A)** — wires `detect-default-branch.mjs` via the same `await import(...)` pattern as other helpers. The bash side first runs `git remote show origin 2>/dev/null` and parses the `HEAD branch:` line out of its output (passing the resulting branch name, or empty string on failure, as `remoteHeadBranch`); it also passes an injected `branchExists(name)` implementation that runs `git rev-parse --verify --quiet "refs/remotes/origin/$name"`. On `branch: null`, the orchestrator emits a stderr message and prints the Trailer aborted line; on any fallback level, the Notice is pushed into the Pre-PR `notices` array. +- **`commands/review-pr.md` (Pre-PR Step B + E)** — Step B uses `buildPrePrContext().notices` and merges them with the default-branch Notice. Step E prints all Notices before findings (per PRD A's pre-PR contract), then the findings, then the Trailer line (which already includes notice counts). +- **CHANGELOG** — `[Unreleased]` Added entry for `detect-default-branch.mjs`; Changed entry for `buildPrePrContext` return shape; Fixed entries for the suspicious-shape detection and the Gitflow-aware fallback. + +End-to-end demoable: run `/pr-review:review-pr` (no URL) in a Gitflow project where `origin/develop` exists but the local `git remote show origin` is offline (e.g. break the remote). The Claude interface prints `⚠ default-branch: Default branch not detected via remote-show; computed diff against origin/develop (develop-fallback).` then the findings, then the Trailer. Repeat in a trunk-only project (no develop branch) → fallback message names `main`. Repeat in a project with no `develop`, `main`, or `master` → `❌ Review aborted: default-branch — No detectable default branch on origin (tried develop, main, master). Specify a base manually.` + +## Acceptance criteria + +- [ ] `buildPrePrContext` returns a `notices: Notice[]` field. +- [ ] `parseChangedFilesFromDiff` suspicious-shape detection emits a DEGRADED Notice (`kind: diff-parse`) via `buildPrePrContext`. +- [ ] `scripts/pre-pr/detect-default-branch.mjs` exists with full unit-test coverage (≥ 6 cases). +- [ ] The fallback chain order is `remote-show` → `origin/develop` → `origin/main` → `origin/master` → `none`. +- [ ] Any fallback level fires a `warning` Notice (`kind: default-branch`) that names the actually-used branch. +- [ ] `none` aborts the run with a clear stderr message and a Trailer aborted line. +- [ ] Pre-PR mode prints all Notices before findings. +- [ ] `commands/review-pr.md` is ≤ 200 lines. +- [ ] `pnpm format`, `pnpm check`, `pnpm --filter pr-review test`, `pnpm --filter pr-review verify:changelog` all pass. + +## Blocked by + +`docs/issues/pr-review-ado-fetcher-reliability/01-end-to-end-notice-pipeline.md` + +--- + +## Triage Notes + +> _This was generated by AI during triage._ + +Locked during the `/grill-with-docs` session of 2026-05-13: Q8(c) suspicious-shape doctrine (non-empty input with `diff --git` headers but zero parsed paths → DEGRADED Notice); Q8(d) + user follow-up — Gitflow-aware fallback chain (`remote-show` → `origin/develop` → `origin/main` → `origin/master` → ABORT) replacing the hardcoded `main` fallback, with a Notice naming the actually-used branch. `detect-default-branch.mjs` takes an injectable `branchExists` tester for cross-platform unit testability. No outstanding questions. diff --git a/docs/issues/pr-review-platform-failure-handling/PRD.md b/docs/issues/pr-review-platform-failure-handling/PRD.md new file mode 100644 index 0000000..57810d5 --- /dev/null +++ b/docs/issues/pr-review-platform-failure-handling/PRD.md @@ -0,0 +1,202 @@ +# PRD: pr-review — platform-failure handling + +**Status:** needs-triage +**Category:** enhancement +**Plugin:** `apps/claude-code/pr-review` +**Depends on:** `docs/issues/pr-review-ado-fetcher-reliability/PRD.md` (PRD A — must land first) + +--- + +## Problem Statement + +After PRD A lands, the four-tier Notice doctrine is established and the ADO Fetcher is correct, but the rest of the plugin still swallows platform failures in three places. The Re-review Coordinator silently downgrades to first-review mode when its match-finding helper throws (duplicating every prior thread); its PATCH-to-fixed call has a catch-all that only special-cases HTTP 409 (silently letting 401/403/5xx through as 200-char "warnings" no one reads). The ADO Writer's inline POST path captures stderr to `*.err` files but only logs them on cleanup — the actual failure text never reaches the user, and 401/403 are treated like recoverable per-finding failures. Pre-PR mode's diff parser returns `[]` for both empty diffs and malformed inputs, so a broken pipeline looks like a clean Review. The default-branch fallback chain still hardcodes `main` even though most Unic projects use Gitflow. + +These are not isolated bugs — they are the same doctrine PRD A established, not yet applied across the remaining surfaces. PRD B finishes the job by routing every ADO write call site through PRD A's canonical helpers, extending the Coordinator's classification helpers to honour `DIFF_RANGE`, and giving Pre-PR mode the same Notice surface that the ADO modes get. + +## Solution + +Apply the Notice Tier doctrine + helper-layer doctrine (both from PRD A) to the Re-review Coordinator, ADO Writer, and Pre-PR mode. The shared helpers (`classify-http-error`, `notices`, `parse-write-response`) own all classification; the agent prompts shrink to "call the helper, branch on `result.ok`". + +The Coordinator consumes the `DIFF_RANGE` sentinel PRD A emits: when `full`, the existing `classify-thread` helper downgrades verdicts that depend on diff position (`addressed`, `obsolete`) to the safer `pending`; `disputed` is unaffected. A DEGRADED Notice surfaces the downgrade. The Coordinator's per-finding match call wraps `match-finding` in try/catch and emits a DEGRADED Notice on throw, instead of silently treating the parse failure as "no match" and duplicating the thread. PATCH-to-fixed routes every response through the canonical HTTP-tier mapping — 401/403 abort the whole re-review, 5xx/network/other-4xx emit per-thread DEGRADED Notices. + +The ADO Writer routes every `az devops invoke` POST/PATCH through `parse-write-response` (a new pure helper composing PRD A's `classify-http-error` with response-`id` parsing). The H1 path (inline POST) inherits the canonical mapping retroactively — auth failures no longer log-and-continue, they abort. `*.err` files stream their content to stderr at the moment of failure (so the failure text is adjacent to the Notice that references it), then are unconditionally cleaned up. The `parseAdoWriterResult` helper is refactored to the discriminated-union shape, so the orchestrator can distinguish a missing result block (Writer crashed before printing) from a parsed-with-zero-findings outcome. + +Pre-PR mode gets the same Notice surface that PRD A wired for ADO modes. `parseChangedFilesFromDiff` detects a suspicious shape (non-empty input with `diff --git` headers but zero parsed paths) and emits a DEGRADED Notice via `buildPrePrContext`. The default-branch detection becomes a fallback chain (`git remote show origin` → `origin/develop` → `origin/main` → `origin/master` → ABORTED) implemented in a pure helper `scripts/pre-pr/detect-default-branch.mjs`, with a Notice that names the actually-used branch when any fallback level fires. + +## User Stories + +1. As a PR reviewer in re-review mode, I want the bot to never silently re-post a thread it already opened on a prior iteration, so that my PR's thread list does not accumulate duplicates. +2. As a PR reviewer, I want any HTTP 401 / 403 error from Azure DevOps during write-back to abort the Review with a clear stderr message naming `az devops login` as the remedy, so that the run does not silently complete with most threads missing. +3. As a PR reviewer, I want a per-thread DEGRADED Notice in the Review Summary listing every thread the bot tried to mark as fixed but couldn't (because of a 5xx or network blip), so that I can manually mark them fixed if appropriate. +4. As a PR reviewer in re-review mode with no incremental diff available, I want prior threads that would have been classified `addressed` or `obsolete` to instead be classified `pending`, so that I am never told a comment is resolved when the bot wasn't actually able to verify it. +5. As a developer running Pre-PR mode in a Gitflow-style project, I want the default-branch detection to try `origin/develop` before `origin/main`, so that the local diff is computed against the actual integration branch most of the time. +6. As a developer running Pre-PR mode, I want a Notice telling me which branch the bot diffed against when default-branch detection fell back, so that I can spot the case where it picked the wrong branch. +7. As a Plugin maintainer, I want the ADO Writer's existing H1 inline-POST path (which today logs auth failures and continues) to inherit the canonical HTTP-tier mapping introduced in PRD A, so that 401/403 abort the writer consistently with every other ADO write. +8. As a Plugin maintainer, I want `*.err` file contents to be visible at the moment of failure, not buried in a cleanup step, so that diagnosing a partial-success run does not require reaching for temp files that may have been deleted. +9. As a Plugin maintainer, I want the `parseAdoWriterResult` helper to distinguish "result block missing" (Writer crashed mid-run) from "result block parsed with zero findings" (legitimate zero outcome), so that the orchestrator can fail loud on the first case instead of silently reporting success. +10. As a PR reviewer, I want a Notice telling me when Pre-PR mode's diff parser detected `diff --git` headers but produced zero file paths, so that I can tell the "no files changed" message apart from "the pipeline broke". +11. As a Plugin maintainer, I want the Coordinator's match-finding error path to emit a DEGRADED Notice (`kind: thread-match`) when the helper throws on a parse error, so that the reviewer sees one warning instead of one silent duplicate posting. +12. As a developer reading the codebase, I want every ADO write call site (inline POST, summary POST, delta reply, completion marker, PATCH-to-fixed) to route through one shared helper, so that adding a new write call type in the future inherits the same HTTP-tier mapping for free. +13. As a Plugin maintainer, I want the existing classify-thread and match-finding tests to be extended with the new branches (diffRange parameter; throw on parse error), so that the new behaviour is verified at the helper boundary even though the agent prompts are not unit-tested. +14. As a developer running re-review mode, I want the partial-run check from H4 (already landed in PR #29) to keep its exit-code contract (`0` = found, `1` = not found, `2` = crash); PRD B does not modify that path. + +## Implementation Decisions + +### Foundation (from PRD A) + +All shared helpers (`scripts/ado/classify-http-error.mjs`, `scripts/ado/notices.mjs`) and the Notice flow + Trailer printing in the orchestrator are assumed in place. PRD B only adds consumers and one new shared helper (`parse-write-response.mjs`). The Notice tier doctrine, the canonical HTTP-tier mapping, the four-state classification, and the no-fifth-ASK-tier rule are all documented in PRD A's ADRs (0014, 0015) and the in-place ADR 0004 amendment. + +### New helpers + +- **`scripts/ado/parse-write-response.mjs`** — pure function composing PRD A's `classify-http-error` with response-`id` parsing. Returns `{ ok: true, id } | { ok: false, tier, kind, message }`. Consumed by every ADO write call site (inline POST, threadContext fallback, summary POST, delta reply, completion marker, PATCH-to-fixed). One shape, one classifier. +- **`scripts/pre-pr/detect-default-branch.mjs`** — pure function over an injectable `branchExists(name) → bool` tester and a `remoteHeadBranch` argument (a string parsed by the bash side from the `HEAD branch:` line of `git remote show origin` output, or empty string on failure). Walks the fallback chain `remoteHeadBranch` → `origin/develop` → `origin/main` → `origin/master` → `{ branch: null }`. Returns `{ branch, source: 'remote-show' | 'develop-fallback' | 'main-fallback' | 'master-fallback' | 'none', notice?: Notice }`. The bash side parses `HEAD branch:` from `git remote show origin 2>/dev/null` and wires the `branchExists` tester to `git rev-parse --verify --quiet`. ABORTED when all four fallback levels fail. + +### Modified helpers + +- **`scripts/re-review/classify-thread.mjs`** — adds a `diffRange: 'full' | 'incremental'` parameter. When `diffRange === 'full'`, outputs that would be `addressed` or `obsolete` are remapped to `pending`; `disputed` is unaffected. Default `diffRange === 'incremental'` preserves today's behaviour. Single new branch, ~3 lines. +- **`scripts/re-review/match-finding.mjs`** — today returns `null` on no match. New contract: `null` continues to mean "legitimate no-match"; a thrown `Error` distinguishes a parse failure in the input. The Coordinator's per-finding call wraps in try/catch. +- **`scripts/ado-writer.mjs` (`parseAdoWriterResult`)** — discriminated-union refactor: `{ ok: true, summaryThreadId, findingsPosted } | { ok: false, reason: 'missing-block' | 'malformed' }`. Subsumes today's `null` return. +- **`scripts/pre-pr.mjs` (`buildPrePrContext`, `parseChangedFilesFromDiff`)** — `buildPrePrContext` return shape extends to `{ rawDiff, changedFiles, filteredFiles, notices: Notice[] }`. `parseChangedFilesFromDiff` detects suspicious shape (non-empty input with ≥ 1 `diff --git` header but zero parsed paths) and emits a DEGRADED Notice (`kind: diff-parse`). + +### Agent and orchestrator changes + +- **`.agents/re-review-coordinator.md`**: + - Consume `DIFF_RANGE` from `ADO_FETCHER_RESULT`; pass it to `classify-thread`. + - Wrap per-finding `match-finding` call in try/catch; on throw, push a DEGRADED Notice and continue to the next finding (do NOT add the unclassified prior thread to `freshFindings` — let it fall through naturally to a duplicate posting, but with a Notice surfacing the cause). + - Route PATCH-to-fixed responses through `parse-write-response`. Tier `aborted` → exit non-zero with the abort kind. Tier `degraded` → push a Notice (`kind: patch-to-fixed`) and continue to the next thread. + - Emit `NOTICES` array in `RE_REVIEW_COORDINATOR_RESULT_START/END` for the orchestrator to merge. +- **`.agents/ado-writer.md`**: + - Route every `az devops invoke` POST/PATCH (inline POST, threadContext-fallback, summary POST, delta reply, completion marker) through `parse-write-response`. + - H1 retroactive fix: the inline POST path inherits the canonical mapping — 401/403 abort the writer immediately, 5xx/network/other-4xx push a `warning` Notice and continue. + - Stream the `*.err` file content to stderr at the moment of failure, then unconditional `rm -f` in cleanup. No conditional retention. + - Emit `NOTICES` array in `ADO_WRITER_RESULT_START/END`. +- **`commands/review-pr.md` (Pre-PR mode)**: + - Wire `detect-default-branch.mjs` (via the existing helper-import pattern). On `branch: null`, abort with stderr message and Trailer aborted line. + - Use `buildPrePrContext().notices` to prepend the pre-findings Notices block in the Claude interface. + - Trailer line includes Pre-PR notice counts (already mandatory per PRD A). + - 200-line cap preserved. + +### Test-scope choice + +The user chose "NEW deep modules only" in the test-scope question during the grilling session. Reconciling that choice with the slice acceptance criteria: + +- **New deep helpers** (`parse-write-response`, `detect-default-branch`) receive full unit-test coverage — every documented return-shape branch is asserted. +- **MODIFY helpers** (`classify-thread`, `match-finding`, `parseAdoWriterResult`, `pre-pr.mjs`) receive **minimal branch-verification** test cases — 2–3 cases per helper, each pinning a single new branch introduced by this PRD (the `diffRange='full'` downgrade, the throw-on-parse-error path, the `{ ok: false, reason }` variants, the suspicious-shape Notice). No full new test suites; the existing fixture conventions in those files are reused. +- **Agent prompt content** (`.agents/*.md`, `commands/review-pr.md`) gets no new tests of any kind. End-to-end behaviour change is verified by the integration smoke test against a real ADO PR, per ADR 0013's stated testing posture. + +The intent of the user's "NEW deep modules only" choice was to avoid writing wholly new test files and full coverage for MODIFY helpers; the minimal branch-verification cases above are necessary to confirm the new behaviour landed and do not constitute new test suites. + +## Testing Decisions + +### What makes a good test + +Same as PRD A: tests assert the external behaviour of each helper given controlled inputs. Plain JS object or short JSON fixtures, sentence-shaped test names, `node:test` + `node:assert/strict`, no external deps. + +### Modules under test + +**New deep helpers (full unit-test coverage):** + +- `scripts/ado/parse-write-response.mjs` — happy path (`{ id: 12345 }` response), 401 → `{ ok: false, tier: 'aborted', kind: 'auth' }`, 5xx → `{ ok: false, tier: 'degraded' }`, 404 → `{ ok: true }` (domain-OK), 409 → `{ ok: true }`, malformed JSON body, network exit-code path, missing `id` field on otherwise-200 response. +- `scripts/pre-pr/detect-default-branch.mjs` — non-empty `remoteHeadBranch` (e.g. `'develop'`) → no fallback Notice, empty `remoteHeadBranch` and `branchExists('develop')=true` → `develop-fallback` with Notice, empty `remoteHeadBranch` and only `main` exists → `main-fallback` with Notice, only `master` exists → `master-fallback` with Notice, nothing exists → `{ branch: null, source: 'none' }` (no notice — Trailer carries the abort), `branchExists` thrown exception → propagated. + +### MODIFY helpers — minimal branch-verification cases + +Each of these receives 2–3 new test cases in its existing test file, pinning the new branch introduced by this PRD. No full new coverage. + +- `classify-thread.mjs` — `diffRange='full'` downgrade of `addressed`/`obsolete` → `pending`; `disputed` unaffected (3 cases). +- `match-finding.mjs` — legitimate no-match still returns `null`; malformed input throws (3 cases). +- `parseAdoWriterResult` — `{ ok: true, ... }` for valid block; `{ ok: false, reason: 'missing-block' }`; `{ ok: false, reason: 'malformed' }` (3 cases). +- `pre-pr.mjs` — suspicious-shape detection emits a `kind: diff-parse` Notice through `buildPrePrContext` (2 cases). + +### Modules NOT under test in PRD B + +- All agent prompt content (`.agents/*.md`, `commands/review-pr.md`) — verified end-to-end by the integration smoke test against a real ADO PR, per ADR 0013. + +### Prior art + +Same as PRD A: `packages/release-tools/scripts/verify-changelog.test.mjs`, `bump-version.test.mjs`, `apps/claude-code/pr-review/tests/parse-diff-hunks.test.mjs`. No external deps, no spawnSync, fixtures as inline JS objects. + +## Out of Scope + +- Anything PRD A delivers (helper layer, canonical HTTP mapping, ADRs, Fetcher fixes, orchestrator Notice merging + Trailer). +- **Full new test suites** for MODIFY-only helpers — only the minimal 2–3 branch-verification cases listed in Testing Decisions are in scope. +- Unit tests for agent prompt content. +- Retries on transient HTTP errors. +- The integration smoke test (manual, post-merge). +- A canonical thread shape spanning ADO and GitHub — deferred per ADR 0013. +- Changes to the four pre-existing re-review modules' interfaces (`detect-prior-review`, `parse-signature`) — only `classify-thread` and `match-finding` are modified, and only additively (new parameter / new throw path). +- Pre-PR mode informational notices for inherently-empty states beyond the Doc Context family — PRD A already capped that. + +## Further Notes + +**Dependency on PRD A:** PRD B cannot land before PRD A. The helper imports (`classify-http-error`, `notices`, `formatTrailer`), the orchestrator's Notice-merge step, the ADO Writer's `## Notices` block rendering, and the Trailer line are all PRD A deliverables that PRD B's new consumers and modified call sites rely on. The two PRDs ship together as a coherent "platform-failure handling" feature; PRD A is the foundation, PRD B is the rollout. + +**Inbox file removal:** the originating `docs/inbox/pr-review-ado-error-hardening-pass.md` is deleted once PRD A and PRD B are published (per the inbox graduation flow documented in `docs/inbox/README.md`). + +**Source:** same grilling session as PRD A. See PRD A's "Further Notes" for the doctrine, ADR cross-references, and `CONTEXT.md` term additions. + +--- + +## Agent Brief + +> _This was generated by AI during triage._ + +**Category:** enhancement +**Summary:** Apply PRD A's four-tier Notice doctrine + helper-layer architecture to the remaining surfaces: Re-review Coordinator, ADO Writer (every write call site, including H1 retroactively), and Pre-PR mode. Adds two new deep helpers (`parse-write-response`, `detect-default-branch`), extends two re-review classifier helpers (`classify-thread` gets a `diffRange` parameter for γ-downgrade; `match-finding` throws instead of returning null on parse error), and refactors `parseAdoWriterResult` to the discriminated-union shape. Default-branch detection becomes a Gitflow-aware fallback chain (`develop` → `main` → `master`) with a Notice naming the actually-used branch. + +**Current behavior (after PRD A lands):** + +- Coordinator's per-finding `match-finding` call falls back to "no match" on Node parse error, silently duplicating prior threads. +- Coordinator's PATCH-to-fixed catch-all only special-cases HTTP 409; auth, 5xx, and network failures become 200-char `process.stdout.write` warnings that nothing reads. Threads stay open, the user is not told. +- ADO Writer's inline POST path (H1, from PR #29) treats 401/403 as recoverable per-finding failures — every subsequent inline POST in the same run also fails, but the user only sees "N findings posted" with N=0 or partial. +- ADO Writer's `*.err` files are unconditionally cleaned up at the end, destroying the only diagnostic for partial-success runs. +- `parseAdoWriterResult` returns `null` for both "Writer never printed a result block" and "Writer parsed but block was malformed", conflating crash with empty-success. +- Pre-PR `parseChangedFilesFromDiff` returns `[]` for both empty input and `diff --git`-bearing input that fails to parse — broken pipelines look like clean reviews. +- Pre-PR default-branch detection hardcodes `main` as the fallback, computing the diff against the wrong base on every Gitflow project. +- Coordinator and Writer ignore the new `DIFF_RANGE` sentinel PRD A emits. + +**Desired behavior:** + +- All five ADO write call sites in the Writer and Coordinator route through `parse-write-response` (composing PRD A's `classify-http-error` with response-`id` parsing). One canonical HTTP-tier mapping across the plugin. +- 401/403 anywhere in a Writer or Coordinator run aborts that run with a single stderr message + Trailer aborted line. +- Every per-thread / per-finding write failure that the canonical mapping classifies as DEGRADED pushes a `warning` Notice (`kind` = `inline-post` / `summary-post` / `patch-to-fixed`) that the orchestrator merges and the Writer renders in the Summary. +- Coordinator consumes `DIFF_RANGE: full | incremental`. When `full`, `classify-thread` downgrades `addressed` / `obsolete` outputs to `pending`; `disputed` is unaffected; a DEGRADED Notice (`kind: diff-range` is emitted by the Fetcher, so the Coordinator only consumes — the Notice is already in the merged array). +- Coordinator `match-finding` calls wrap in try/catch; on throw, push DEGRADED Notice (`kind: thread-match`) and let the finding fall through naturally — the reviewer sees one duplicate-and-Notice instead of one silent duplicate. +- Writer streams `*.err` content to stderr at the moment of failure; unconditional cleanup follows. +- `parseAdoWriterResult` returns the discriminated union; orchestrator fails-loud on `{ ok: false, reason: 'missing-block' }`. +- Pre-PR `parseChangedFilesFromDiff` detects suspicious-shape and emits a DEGRADED Notice (`kind: diff-parse`); `buildPrePrContext` returns the Notice array. +- Pre-PR `detect-default-branch.mjs` walks the parsed `HEAD branch:` line from `git remote show origin` → `origin/develop` → `origin/main` → `origin/master`; emits a Notice naming the actually-used branch; aborts when none exists. + +**Key interfaces:** + +- `parseWriteResponse({ httpExit, responseText, errStream }) → { ok: true, id } | { ok: false, tier, kind, message }`. +- `detectDefaultBranch({ branchExists }) → { branch, source, notice? }`. +- `classifyThread({ ..., diffRange }) → { classification }` — new optional parameter with default `'incremental'`. +- `matchFinding(...) → { classification, threadId } | null` — now throws on parse error. +- `parseAdoWriterResult(...) → { ok: true, summaryThreadId, findingsPosted } | { ok: false, reason }`. +- `buildPrePrContext(rawDiff) → { rawDiff, changedFiles, filteredFiles, notices: Notice[] }`. + +**Acceptance criteria:** + +- [ ] PRD A is merged before PRD B starts. +- [ ] Every `az devops invoke` POST/PATCH in `.agents/ado-writer.md` and `.agents/re-review-coordinator.md` is routed through `parse-write-response`. +- [ ] 401 or 403 from any Writer or Coordinator HTTP call aborts the run with a clear stderr message and a Trailer aborted line. +- [ ] 5xx / network / other-4xx from any write call emits a DEGRADED Notice and continues; the Notice appears in the Review Summary. +- [ ] `classify-thread` accepts a `diffRange` parameter; when `'full'`, `addressed` / `obsolete` are remapped to `pending`; `disputed` unaffected. +- [ ] `match-finding` throws on parse error; the Coordinator's call site catches the throw and emits a DEGRADED Notice (`kind: thread-match`). +- [ ] `parseAdoWriterResult` returns a discriminated union; the orchestrator surfaces `{ ok: false, reason: 'missing-block' }` as an ABORTED run. +- [ ] `buildPrePrContext` returns a `notices: Notice[]` field; suspicious-shape diffs emit a DEGRADED Notice (`kind: diff-parse`). +- [ ] `detect-default-branch.mjs` exists, has unit tests covering the four fallback levels + the abort case, and the orchestrator wires it via injectable `branchExists` plus a `remoteHeadBranch` argument parsed from `git remote show origin`'s `HEAD branch:` line. +- [ ] Pre-PR mode aborts with a clear stderr message when none of `develop`, `main`, `master` exist. +- [ ] `*.err` content is streamed to stderr at the moment of failure; cleanup is unconditional. +- [ ] `commands/review-pr.md` remains ≤ 200 lines. +- [ ] `pnpm test` passes; `pnpm format` produces no diff; `pnpm check` reports zero warnings. +- [ ] `docs/inbox/pr-review-ado-error-hardening-pass.md` is removed. + +**Out of scope:** + +- Anything PRD A delivers. +- Retries on transient HTTP errors. +- Integration smoke test (manual, post-merge). +- Lifting helpers to `pr-review-toolkit`. +- Full new test suites for MODIFY helpers — only minimal branch-verification cases (per PRD Testing Decisions) are in scope.