feat: min-severity cascading and review-level filter by cpcloud · Pull Request #635 · roborev-dev/roborev

cpcloud · 2026-04-07T13:56:17Z

Summary

Add min_severity config cascade (CLI flag → repo .roborev.toml → global config.toml) for review, refine, and fix workflows
Persist min_severity on review_jobs (SQLite + PostgreSQL v11), thread through enqueue/claim/sync paths
Inject severity instructions into review and fix prompts via config.SeverityInstruction
Parse SEVERITY_THRESHOLD_MET marker as a pass verdict
CI path: prompt uses review_min_severity, synthesis uses [ci].min_severity; single-result CI runs route through synthesis when CI threshold is set
formatSingleResult uses storage.ParseVerdict to detect pass output (not loose strings.Contains)
Fix GetJobsWithReviewsByIDs hydration bug where min_severity was scanned then overwritten by zero-value field

roborev-ci · 2026-04-07T13:59:38Z

roborev: Combined Review (`7e12564`)

Verdict: changes introduce one medium-severity issue that should be addressed before merge.

Medium

internal/storage/verdict.go:62, internal/review/synthesize.go:95
ParseVerdict now treats any output containing SEVERITY_THRESHOLD_MET as a pass after only checking for severity-labeled findings. Because agent output is external service data, a model response could still describe real issues in plain prose, or merely echo the marker in explanatory text, without using the expected High/Medium/... labels. That can incorrectly flip the verdict to pass and, in the single-result CI path, mark the review as passed.

Suggested remediation: only accept the marker when it is an exact standalone sentinel with no other substantive content, or only trust it from a synthesis step that emits a strict schema. Reject marker-bearing outputs that contain additional non-whitespace lines outside a tightly defined allowlist.

Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

Promote refine_min_severity and fix_min_severity to global config and add review_min_severity (global + per-repo + per-job CLI flag) that filters reviews at prompt time via SeverityInstruction. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Fix three review findings on the min_severity cascading design spec: - Add runLocalReview daemon-bypass handling to §4 with fail-fast resolver call and Build/BuildDirty threading, so --local mode honors the flag - Move ParseVerdict marker check below hasSeverityLabel so severity-labeled findings remain authoritative in mixed outputs - Change worker and handleFixJob to fail loudly on invalid configured severity instead of silently disabling the filter, matching the fail-fast behavior in fix.go and refine.go Test plan extended with --local coverage, marker-vs-severity precedence, and invalid-config failure cases. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…_severity spec Iteration 3 review fixes: - §4 + Files-touched row for review.go: require sending the canonical lowercase value from NormalizeMinSeverity (not raw user input). The worker uses job.MinSeverity verbatim and SeverityInstruction's severityAbove map only accepts canonical keys, so storing "HIGH" or " high " would silently disable filtering. Added a "Why normalize at the boundary" subsection explaining the contract. - §3.5 handleFixJob: resolve against an effectivePath that prefers parentJob.WorktreePath when present and valid (via gitpkg.ValidateWorktreeForRepo), falling back to parentJob.RepoPath. Mirrors the worker.go:367 pattern so worktree-backed fixes pick up per-branch .roborev.toml overrides. - §3.5 handleFixJob: added an explicit if !parentJob.IsTaskJob() guard around the entire severity-resolution block. Task/insights parents have free-form output without severity labels, so injecting a severity filter would tell the agent to ignore the very content the user wants acted on. Mirrors cmd/roborev/fix.go:824. Test plan extended with mixed-case CLI normalization cases, worktree path resolution coverage, task-parent skip coverage, and stale-worktree fallback coverage. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…verity spec Iteration 4 review fixes: - §3.5: Removed minSeverity threading from buildRebasePrompt. Rebase prompts only carry the stale patch with no findings to map against, so applying a severity filter risks silently dropping previously- validated hunks of an existing fix. Added an explicit "rebase is threshold-free" subsection. The findings-driven helpers (buildFixPrompt, buildFixPromptWithInstructions) still take the new parameter. - §4: Added a "Two trust boundaries" subsection requiring handleEnqueue to validate req.MinSeverity via NormalizeMinSeverity at the daemon boundary (mirroring the existing ResolveReviewReasoning pattern at line 866). The daemon rejects invalid input with 400 Bad Request and stores the normalized return value. This guards non-CLI callers (sync from older machines, MCP wrappers, automation scripts) that bypass CLI-side normalization. Worker stays simple — reads job.MinSeverity verbatim, trusting the write-time normalization invariant. - §4 wire path: bumped from three to four edits, with explicit validation step listed. - Test plan: added a daemon-side test case for the legacy IsTaskJob() fallback shape (job_type empty, no commit/range/dirty) so sync-pulled jobs from older machines retain parity protection. Added a buildRebasePrompt severity-free assertion. Added three new server_test.go cases covering handleEnqueue normalization (valid, mixed-case → canonical, invalid → 400). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

13-task TDD plan covering config, storage, verdict, prompt, daemon, and CLI changes for review_min_severity + global promotion of refine/fix severity settings. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Strip all git add/commit/staging instructions from the 13-task plan. Plans are about what to build, not how to commit it. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

… checklist The SQL site checklist table still referenced nullString/nullStr for sync write paths (UpsertPulledJob, PG UpsertJob, EnqueueJob), which would bypass the storage-invariant normalization described in §4. Also fix hydration type from sql.NullString to plain string (queries use COALESCE), and clarify that handleEnqueue stores normalized value. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Three gaps identified by review: 1. BatchUpsertJobs is the production PG sync push path, not UpsertJob. Added to spec SQL checklist, files-touched table, and plan Task 6. 2. roborev ci calls batch.go runSingle -> builder.Build without minSeverity. Added spec section 5.5, plan Task 12.5, and BatchConfig.MinSeverity threading. 3. handleFixJob tests lacked worktree-override and legacy-task-parent coverage. Added two test cases to plan Task 11. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…tests 1. BatchUpsertJobs: min_severity value goes between WorktreePath and SourceMachineID to match column order, not after SourceMachineID. 2. Worktree test: use real linked git worktree via git worktree add so ValidateWorktreeForRepo passes. Add stale-worktree fallback case. 3. Legacy task test: clear job_type post-insert via UPDATE to exercise IsTaskJob legacy fallback heuristic, since EnqueueJob auto-fills it. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…rity Promote ResolveFixMinSeverity and ResolveRefineMinSeverity from 2-param (explicit, repoPath) to 3-param (+ globalCfg *Config) to support the cascade: explicit > repo config > global config > empty default. Add ReviewMinSeverity to both Config (global) and RepoConfig (per-repo), plus a new ResolveReviewMinSeverity resolver following the same pattern. Update all callers in fix.go and refine.go to pass globalCfg. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add min_severity column to review_jobs with full plumbing through storage (SQLite migration, PostgreSQL v11 schema, CRUD, sync), prompt injection (Build/BuildDirty signatures), daemon (enqueue validation, worker resolution, fix job baking), CLI (--min-severity flag), and CI batch review path. Agents receive severity filtering instructions in prompts and output SEVERITY_THRESHOLD_MET marker when all findings fall below threshold. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

testifylint requires assert.Empty over assert.Equal("", ...). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…eviewsByIDs Was scanning directly into job struct, then applyReviewJobScan overwrote with zero-value fields.MinSeverity. Batch API responses silently lost min_severity metadata. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

CI was using [ci].min_severity for both prompt injection (suppressing agent findings) and synthesis filtering. Now uses ResolveReviewMinSeverity for prompt-level filtering and ResolveCIMinSeverity only for synthesis. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Single-result CI runs skip synthesis, so CI min_severity was never applied. Now prompt threshold uses StricterSeverity(review, ci) to ensure filtering happens even without synthesis. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

StricterSeverity approach conflated review and CI thresholds for multi-result runs. Instead: prompt uses review_min_severity only, and single-result CI path falls through to synthesis when CI min_severity is set, so filtering still happens. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

MinSeverity "low" means no filtering, so skip unnecessary synthesis for single-result CI runs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…rmatting formatSingleResult now recognizes the threshold marker as a pass, preventing raw marker text from appearing in CI output. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…matting strings.Contains(marker) was too loose — showed Review Passed even when output also contained severity-labeled findings. Now delegates to storage.ParseVerdict which checks severity labels first. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

roborev-ci · 2026-04-07T14:35:29Z

roborev: Combined Review (`3ab44ee`)

Summary: No medium-or-higher issues identified across the submitted reviews.

All reviewed outputs were clean at Medium, High, and Critical severity.

Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

The previous substring check let prose findings without severity labels flip the verdict to pass whenever an agent echoed the marker in narration. Add config.IsMarkerOnlyOutput which accepts the marker only when it is essentially the entire output (allowing whitespace, trailing period, bullet, bold, or code-fence decoration), and route verdict.go and refine.go through it. Addresses review feedback on PR roborev-dev#635.

roborev-ci · 2026-04-07T14:42:55Z

roborev: Combined Review (`6141603`)

Verdict: No medium-or-higher findings; the reviewed diff appears clean based on the provided review outputs.

All provided reviews reported no Medium, High, or Critical issues.

Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

Extended IsMarkerOnlyOutput to unwrap single * and _ italic markers in addition to the existing ** and __ bold forms, so agent output like *SEVERITY_THRESHOLD_MET* is correctly recognized as a below- threshold pass. Bold forms are stripped before italic to avoid degrading **X** into *X*. Updated the doc comment to spell out the accepted decoration forms and added test coverage for italic star, italic underscore, and bold-italic (***X***) in both config_test.go and verdict_test.go.

roborev-ci · 2026-04-07T14:56:31Z

roborev: Combined Review (`6d9f02b`)

Verdict: No Medium, High, or Critical findings identified.

The reviewed outputs did not surface any issues at Medium severity or above. Security review explicitly reported no issues found, and there were no actionable findings to consolidate from the other review outputs.

Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

modernize linter flagged the HasPrefix + TrimPrefix pair on the fenced code block strip path. Replace with CutPrefix to satisfy the rule.

roborev-ci · 2026-04-07T15:01:27Z

roborev: Combined Review (`c3ff148`)

Verdict: No Medium, High, or Critical findings.

All reviewed outputs found the changes clean at the requested severity threshold.

Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

modernize linter flagged the strings.Index + slice pattern. Replace with strings.Cut to satisfy the rule.

roborev-ci · 2026-04-07T15:09:27Z

roborev: Combined Review (`26c5072`)

No medium-or-higher issues found; the reviewed changes look clean based on the provided agent outputs.

Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

cpcloud · 2026-04-07T15:24:12Z

@wesm majority of the LoC here is plan markdown, happy to remove that if you want.

cpcloud and others added 19 commits April 7, 2026 10:01

docs: implementation plan for min-severity cascading

bcabfff

13-task TDD plan covering config, storage, verdict, prompt, daemon, and CLI changes for review_min_severity + global promotion of refine/fix severity settings. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

docs: remove git commit steps from min-severity implementation plan

5ee28ae

Strip all git add/commit/staging instructions from the 13-task plan. Plans are about what to build, not how to commit it. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

fix(lint): use assert.Empty for empty string checks

4e19f75

testifylint requires assert.Empty over assert.Equal("", ...). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

fix(review): treat low min-severity same as empty in single-result path

23a3ad5

MinSeverity "low" means no filtering, so skip unnecessary synthesis for single-result CI runs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

fix(review): treat SEVERITY_THRESHOLD_MET as pass in single-result fo…

6a726cb

…rmatting formatSingleResult now recognizes the threshold marker as a pass, preventing raw marker text from appearing in CI output. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

cpcloud force-pushed the worktree-hashed-wondering-nest branch from 7e12564 to 3ab44ee Compare April 7, 2026 14:31

fix(lint): use strings.CutPrefix in IsMarkerOnlyOutput

c3ff148

modernize linter flagged the HasPrefix + TrimPrefix pair on the fenced code block strip path. Replace with CutPrefix to satisfy the rule.

fix(lint): use strings.Cut for newline split in IsMarkerOnlyOutput

26c5072

modernize linter flagged the strings.Index + slice pattern. Replace with strings.Cut to satisfy the rule.

Remove specs

f338590

wesm merged commit ff9af2e into roborev-dev:main Apr 7, 2026
7 of 8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: min-severity cascading and review-level filter#635

feat: min-severity cascading and review-level filter#635
wesm merged 24 commits intoroborev-dev:mainfrom
cpcloud:worktree-hashed-wondering-nest

cpcloud commented Apr 7, 2026

Uh oh!

roborev-ci bot commented Apr 7, 2026

Uh oh!

roborev-ci bot commented Apr 7, 2026

Uh oh!

roborev-ci bot commented Apr 7, 2026

Uh oh!

roborev-ci bot commented Apr 7, 2026

Uh oh!

roborev-ci bot commented Apr 7, 2026

Uh oh!

roborev-ci bot commented Apr 7, 2026

Uh oh!

cpcloud commented Apr 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

cpcloud commented Apr 7, 2026

Summary

Uh oh!

roborev-ci bot commented Apr 7, 2026

roborev: Combined Review (7e12564)

Medium

Uh oh!

roborev-ci bot commented Apr 7, 2026

roborev: Combined Review (3ab44ee)

Uh oh!

roborev-ci bot commented Apr 7, 2026

roborev: Combined Review (6141603)

Uh oh!

roborev-ci bot commented Apr 7, 2026

roborev: Combined Review (6d9f02b)

Uh oh!

roborev-ci bot commented Apr 7, 2026

roborev: Combined Review (c3ff148)

Uh oh!

roborev-ci bot commented Apr 7, 2026

roborev: Combined Review (26c5072)

Uh oh!

cpcloud commented Apr 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

roborev: Combined Review (`7e12564`)

roborev: Combined Review (`3ab44ee`)

roborev: Combined Review (`6141603`)

roborev: Combined Review (`6d9f02b`)

roborev: Combined Review (`c3ff148`)

roborev: Combined Review (`26c5072`)