Skip to content

feat: min-severity cascading and review-level filter#635

Merged
wesm merged 24 commits intoroborev-dev:mainfrom
cpcloud:worktree-hashed-wondering-nest
Apr 7, 2026
Merged

feat: min-severity cascading and review-level filter#635
wesm merged 24 commits intoroborev-dev:mainfrom
cpcloud:worktree-hashed-wondering-nest

Conversation

@cpcloud
Copy link
Copy Markdown
Contributor

@cpcloud cpcloud commented Apr 7, 2026

Summary

  • Add min_severity config cascade (CLI flag → repo .roborev.toml → global config.toml) for review, refine, and fix workflows
  • Persist min_severity on review_jobs (SQLite + PostgreSQL v11), thread through enqueue/claim/sync paths
  • Inject severity instructions into review and fix prompts via config.SeverityInstruction
  • Parse SEVERITY_THRESHOLD_MET marker as a pass verdict
  • CI path: prompt uses review_min_severity, synthesis uses [ci].min_severity; single-result CI runs route through synthesis when CI threshold is set
  • formatSingleResult uses storage.ParseVerdict to detect pass output (not loose strings.Contains)
  • Fix GetJobsWithReviewsByIDs hydration bug where min_severity was scanned then overwritten by zero-value field

@roborev-ci
Copy link
Copy Markdown

roborev-ci bot commented Apr 7, 2026

roborev: Combined Review (7e12564)

Verdict: changes introduce one medium-severity issue that should be addressed before merge.

Medium

  • internal/storage/verdict.go:62, internal/review/synthesize.go:95
    ParseVerdict now treats any output containing SEVERITY_THRESHOLD_MET as a pass after only checking for severity-labeled findings. Because agent output is external service data, a model response could still describe real issues in plain prose, or merely echo the marker in explanatory text, without using the expected High/Medium/... labels. That can incorrectly flip the verdict to pass and, in the single-result CI path, mark the review as passed.

    Suggested remediation: only accept the marker when it is an exact standalone sentinel with no other substantive content, or only trust it from a synthesis step that emits a strict schema. Reject marker-bearing outputs that contain additional non-whitespace lines outside a tightly defined allowlist.


Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

cpcloud and others added 19 commits April 7, 2026 10:01
Promote refine_min_severity and fix_min_severity to global config and
add review_min_severity (global + per-repo + per-job CLI flag) that
filters reviews at prompt time via SeverityInstruction.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Fix three review findings on the min_severity cascading design spec:
- Add runLocalReview daemon-bypass handling to §4 with fail-fast resolver
  call and Build/BuildDirty threading, so --local mode honors the flag
- Move ParseVerdict marker check below hasSeverityLabel so severity-labeled
  findings remain authoritative in mixed outputs
- Change worker and handleFixJob to fail loudly on invalid configured
  severity instead of silently disabling the filter, matching the fail-fast
  behavior in fix.go and refine.go

Test plan extended with --local coverage, marker-vs-severity precedence,
and invalid-config failure cases.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…_severity spec

Iteration 3 review fixes:
- §4 + Files-touched row for review.go: require sending the canonical
  lowercase value from NormalizeMinSeverity (not raw user input). The
  worker uses job.MinSeverity verbatim and SeverityInstruction's
  severityAbove map only accepts canonical keys, so storing "HIGH" or
  " high " would silently disable filtering. Added a "Why normalize at
  the boundary" subsection explaining the contract.
- §3.5 handleFixJob: resolve against an effectivePath that prefers
  parentJob.WorktreePath when present and valid (via
  gitpkg.ValidateWorktreeForRepo), falling back to parentJob.RepoPath.
  Mirrors the worker.go:367 pattern so worktree-backed fixes pick up
  per-branch .roborev.toml overrides.
- §3.5 handleFixJob: added an explicit if !parentJob.IsTaskJob() guard
  around the entire severity-resolution block. Task/insights parents
  have free-form output without severity labels, so injecting a
  severity filter would tell the agent to ignore the very content the
  user wants acted on. Mirrors cmd/roborev/fix.go:824.

Test plan extended with mixed-case CLI normalization cases, worktree
path resolution coverage, task-parent skip coverage, and stale-worktree
fallback coverage.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…verity spec

Iteration 4 review fixes:
- §3.5: Removed minSeverity threading from buildRebasePrompt. Rebase
  prompts only carry the stale patch with no findings to map against,
  so applying a severity filter risks silently dropping previously-
  validated hunks of an existing fix. Added an explicit "rebase is
  threshold-free" subsection. The findings-driven helpers
  (buildFixPrompt, buildFixPromptWithInstructions) still take the new
  parameter.
- §4: Added a "Two trust boundaries" subsection requiring handleEnqueue
  to validate req.MinSeverity via NormalizeMinSeverity at the daemon
  boundary (mirroring the existing ResolveReviewReasoning pattern at
  line 866). The daemon rejects invalid input with 400 Bad Request and
  stores the normalized return value. This guards non-CLI callers
  (sync from older machines, MCP wrappers, automation scripts) that
  bypass CLI-side normalization. Worker stays simple — reads
  job.MinSeverity verbatim, trusting the write-time normalization
  invariant.
- §4 wire path: bumped from three to four edits, with explicit
  validation step listed.
- Test plan: added a daemon-side test case for the legacy IsTaskJob()
  fallback shape (job_type empty, no commit/range/dirty) so sync-pulled
  jobs from older machines retain parity protection. Added a
  buildRebasePrompt severity-free assertion. Added three new
  server_test.go cases covering handleEnqueue normalization (valid,
  mixed-case → canonical, invalid → 400).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
13-task TDD plan covering config, storage, verdict, prompt,
daemon, and CLI changes for review_min_severity + global promotion
of refine/fix severity settings.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Strip all git add/commit/staging instructions from the 13-task plan.
Plans are about what to build, not how to commit it.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… checklist

The SQL site checklist table still referenced nullString/nullStr for
sync write paths (UpsertPulledJob, PG UpsertJob, EnqueueJob), which
would bypass the storage-invariant normalization described in §4.
Also fix hydration type from sql.NullString to plain string (queries
use COALESCE), and clarify that handleEnqueue stores normalized value.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Three gaps identified by review:

1. BatchUpsertJobs is the production PG sync push path, not UpsertJob.
   Added to spec SQL checklist, files-touched table, and plan Task 6.

2. roborev ci calls batch.go runSingle -> builder.Build without
   minSeverity. Added spec section 5.5, plan Task 12.5, and
   BatchConfig.MinSeverity threading.

3. handleFixJob tests lacked worktree-override and legacy-task-parent
   coverage. Added two test cases to plan Task 11.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…tests

1. BatchUpsertJobs: min_severity value goes between WorktreePath and
   SourceMachineID to match column order, not after SourceMachineID.

2. Worktree test: use real linked git worktree via git worktree add
   so ValidateWorktreeForRepo passes. Add stale-worktree fallback case.

3. Legacy task test: clear job_type post-insert via UPDATE to exercise
   IsTaskJob legacy fallback heuristic, since EnqueueJob auto-fills it.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…rity

Promote ResolveFixMinSeverity and ResolveRefineMinSeverity from 2-param
(explicit, repoPath) to 3-param (+ globalCfg *Config) to support the
cascade: explicit > repo config > global config > empty default.

Add ReviewMinSeverity to both Config (global) and RepoConfig (per-repo),
plus a new ResolveReviewMinSeverity resolver following the same pattern.

Update all callers in fix.go and refine.go to pass globalCfg.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add min_severity column to review_jobs with full plumbing through
storage (SQLite migration, PostgreSQL v11 schema, CRUD, sync),
prompt injection (Build/BuildDirty signatures), daemon (enqueue
validation, worker resolution, fix job baking), CLI (--min-severity
flag), and CI batch review path. Agents receive severity filtering
instructions in prompts and output SEVERITY_THRESHOLD_MET marker
when all findings fall below threshold.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
testifylint requires assert.Empty over assert.Equal("", ...).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…eviewsByIDs

Was scanning directly into job struct, then applyReviewJobScan
overwrote with zero-value fields.MinSeverity. Batch API responses
silently lost min_severity metadata.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
CI was using [ci].min_severity for both prompt injection (suppressing
agent findings) and synthesis filtering. Now uses ResolveReviewMinSeverity
for prompt-level filtering and ResolveCIMinSeverity only for synthesis.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Single-result CI runs skip synthesis, so CI min_severity was never
applied. Now prompt threshold uses StricterSeverity(review, ci) to
ensure filtering happens even without synthesis.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
StricterSeverity approach conflated review and CI thresholds for
multi-result runs. Instead: prompt uses review_min_severity only,
and single-result CI path falls through to synthesis when CI
min_severity is set, so filtering still happens.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
MinSeverity "low" means no filtering, so skip unnecessary synthesis
for single-result CI runs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…rmatting

formatSingleResult now recognizes the threshold marker as a pass,
preventing raw marker text from appearing in CI output.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…matting

strings.Contains(marker) was too loose — showed Review Passed even
when output also contained severity-labeled findings. Now delegates
to storage.ParseVerdict which checks severity labels first.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@cpcloud cpcloud force-pushed the worktree-hashed-wondering-nest branch from 7e12564 to 3ab44ee Compare April 7, 2026 14:31
@roborev-ci
Copy link
Copy Markdown

roborev-ci bot commented Apr 7, 2026

roborev: Combined Review (3ab44ee)

Summary: No medium-or-higher issues identified across the submitted reviews.

All reviewed outputs were clean at Medium, High, and Critical severity.


Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

The previous substring check let prose findings without severity
labels flip the verdict to pass whenever an agent echoed the marker
in narration. Add config.IsMarkerOnlyOutput which accepts the marker
only when it is essentially the entire output (allowing whitespace,
trailing period, bullet, bold, or code-fence decoration), and route
verdict.go and refine.go through it.

Addresses review feedback on PR roborev-dev#635.
@roborev-ci
Copy link
Copy Markdown

roborev-ci bot commented Apr 7, 2026

roborev: Combined Review (6141603)

Verdict: No medium-or-higher findings; the reviewed diff appears clean based on the provided review outputs.

All provided reviews reported no Medium, High, or Critical issues.


Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

Extended IsMarkerOnlyOutput to unwrap single * and _ italic markers
in addition to the existing ** and __ bold forms, so agent output
like *SEVERITY_THRESHOLD_MET* is correctly recognized as a below-
threshold pass. Bold forms are stripped before italic to avoid
degrading **X** into *X*. Updated the doc comment to spell out the
accepted decoration forms and added test coverage for italic star,
italic underscore, and bold-italic (***X***) in both config_test.go
and verdict_test.go.
@roborev-ci
Copy link
Copy Markdown

roborev-ci bot commented Apr 7, 2026

roborev: Combined Review (6d9f02b)

Verdict: No Medium, High, or Critical findings identified.

The reviewed outputs did not surface any issues at Medium severity or above. Security review explicitly reported no issues found, and there were no actionable findings to consolidate from the other review outputs.


Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

modernize linter flagged the HasPrefix + TrimPrefix pair on the fenced
code block strip path. Replace with CutPrefix to satisfy the rule.
@roborev-ci
Copy link
Copy Markdown

roborev-ci bot commented Apr 7, 2026

roborev: Combined Review (c3ff148)

Verdict: No Medium, High, or Critical findings.

All reviewed outputs found the changes clean at the requested severity threshold.


Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

modernize linter flagged the strings.Index + slice pattern. Replace
with strings.Cut to satisfy the rule.
@roborev-ci
Copy link
Copy Markdown

roborev-ci bot commented Apr 7, 2026

roborev: Combined Review (26c5072)

No medium-or-higher issues found; the reviewed changes look clean based on the provided agent outputs.


Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

@cpcloud
Copy link
Copy Markdown
Contributor Author

cpcloud commented Apr 7, 2026

@wesm majority of the LoC here is plan markdown, happy to remove that if you want.

@wesm wesm merged commit ff9af2e into roborev-dev:main Apr 7, 2026
7 of 8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants