feat: min-severity cascading and review-level filter#635
feat: min-severity cascading and review-level filter#635wesm merged 24 commits intoroborev-dev:mainfrom
Conversation
roborev: Combined Review (
|
Promote refine_min_severity and fix_min_severity to global config and add review_min_severity (global + per-repo + per-job CLI flag) that filters reviews at prompt time via SeverityInstruction. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Fix three review findings on the min_severity cascading design spec: - Add runLocalReview daemon-bypass handling to §4 with fail-fast resolver call and Build/BuildDirty threading, so --local mode honors the flag - Move ParseVerdict marker check below hasSeverityLabel so severity-labeled findings remain authoritative in mixed outputs - Change worker and handleFixJob to fail loudly on invalid configured severity instead of silently disabling the filter, matching the fail-fast behavior in fix.go and refine.go Test plan extended with --local coverage, marker-vs-severity precedence, and invalid-config failure cases. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…_severity spec Iteration 3 review fixes: - §4 + Files-touched row for review.go: require sending the canonical lowercase value from NormalizeMinSeverity (not raw user input). The worker uses job.MinSeverity verbatim and SeverityInstruction's severityAbove map only accepts canonical keys, so storing "HIGH" or " high " would silently disable filtering. Added a "Why normalize at the boundary" subsection explaining the contract. - §3.5 handleFixJob: resolve against an effectivePath that prefers parentJob.WorktreePath when present and valid (via gitpkg.ValidateWorktreeForRepo), falling back to parentJob.RepoPath. Mirrors the worker.go:367 pattern so worktree-backed fixes pick up per-branch .roborev.toml overrides. - §3.5 handleFixJob: added an explicit if !parentJob.IsTaskJob() guard around the entire severity-resolution block. Task/insights parents have free-form output without severity labels, so injecting a severity filter would tell the agent to ignore the very content the user wants acted on. Mirrors cmd/roborev/fix.go:824. Test plan extended with mixed-case CLI normalization cases, worktree path resolution coverage, task-parent skip coverage, and stale-worktree fallback coverage. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…verity spec Iteration 4 review fixes: - §3.5: Removed minSeverity threading from buildRebasePrompt. Rebase prompts only carry the stale patch with no findings to map against, so applying a severity filter risks silently dropping previously- validated hunks of an existing fix. Added an explicit "rebase is threshold-free" subsection. The findings-driven helpers (buildFixPrompt, buildFixPromptWithInstructions) still take the new parameter. - §4: Added a "Two trust boundaries" subsection requiring handleEnqueue to validate req.MinSeverity via NormalizeMinSeverity at the daemon boundary (mirroring the existing ResolveReviewReasoning pattern at line 866). The daemon rejects invalid input with 400 Bad Request and stores the normalized return value. This guards non-CLI callers (sync from older machines, MCP wrappers, automation scripts) that bypass CLI-side normalization. Worker stays simple — reads job.MinSeverity verbatim, trusting the write-time normalization invariant. - §4 wire path: bumped from three to four edits, with explicit validation step listed. - Test plan: added a daemon-side test case for the legacy IsTaskJob() fallback shape (job_type empty, no commit/range/dirty) so sync-pulled jobs from older machines retain parity protection. Added a buildRebasePrompt severity-free assertion. Added three new server_test.go cases covering handleEnqueue normalization (valid, mixed-case → canonical, invalid → 400). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
13-task TDD plan covering config, storage, verdict, prompt, daemon, and CLI changes for review_min_severity + global promotion of refine/fix severity settings. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Strip all git add/commit/staging instructions from the 13-task plan. Plans are about what to build, not how to commit it. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… checklist The SQL site checklist table still referenced nullString/nullStr for sync write paths (UpsertPulledJob, PG UpsertJob, EnqueueJob), which would bypass the storage-invariant normalization described in §4. Also fix hydration type from sql.NullString to plain string (queries use COALESCE), and clarify that handleEnqueue stores normalized value. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Three gaps identified by review: 1. BatchUpsertJobs is the production PG sync push path, not UpsertJob. Added to spec SQL checklist, files-touched table, and plan Task 6. 2. roborev ci calls batch.go runSingle -> builder.Build without minSeverity. Added spec section 5.5, plan Task 12.5, and BatchConfig.MinSeverity threading. 3. handleFixJob tests lacked worktree-override and legacy-task-parent coverage. Added two test cases to plan Task 11. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…tests 1. BatchUpsertJobs: min_severity value goes between WorktreePath and SourceMachineID to match column order, not after SourceMachineID. 2. Worktree test: use real linked git worktree via git worktree add so ValidateWorktreeForRepo passes. Add stale-worktree fallback case. 3. Legacy task test: clear job_type post-insert via UPDATE to exercise IsTaskJob legacy fallback heuristic, since EnqueueJob auto-fills it. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…rity Promote ResolveFixMinSeverity and ResolveRefineMinSeverity from 2-param (explicit, repoPath) to 3-param (+ globalCfg *Config) to support the cascade: explicit > repo config > global config > empty default. Add ReviewMinSeverity to both Config (global) and RepoConfig (per-repo), plus a new ResolveReviewMinSeverity resolver following the same pattern. Update all callers in fix.go and refine.go to pass globalCfg. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add min_severity column to review_jobs with full plumbing through storage (SQLite migration, PostgreSQL v11 schema, CRUD, sync), prompt injection (Build/BuildDirty signatures), daemon (enqueue validation, worker resolution, fix job baking), CLI (--min-severity flag), and CI batch review path. Agents receive severity filtering instructions in prompts and output SEVERITY_THRESHOLD_MET marker when all findings fall below threshold. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
testifylint requires assert.Empty over assert.Equal("", ...).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…eviewsByIDs Was scanning directly into job struct, then applyReviewJobScan overwrote with zero-value fields.MinSeverity. Batch API responses silently lost min_severity metadata. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
CI was using [ci].min_severity for both prompt injection (suppressing agent findings) and synthesis filtering. Now uses ResolveReviewMinSeverity for prompt-level filtering and ResolveCIMinSeverity only for synthesis. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Single-result CI runs skip synthesis, so CI min_severity was never applied. Now prompt threshold uses StricterSeverity(review, ci) to ensure filtering happens even without synthesis. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
StricterSeverity approach conflated review and CI thresholds for multi-result runs. Instead: prompt uses review_min_severity only, and single-result CI path falls through to synthesis when CI min_severity is set, so filtering still happens. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
MinSeverity "low" means no filtering, so skip unnecessary synthesis for single-result CI runs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…rmatting formatSingleResult now recognizes the threshold marker as a pass, preventing raw marker text from appearing in CI output. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…matting strings.Contains(marker) was too loose — showed Review Passed even when output also contained severity-labeled findings. Now delegates to storage.ParseVerdict which checks severity labels first. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
7e12564 to
3ab44ee
Compare
roborev: Combined Review (
|
The previous substring check let prose findings without severity labels flip the verdict to pass whenever an agent echoed the marker in narration. Add config.IsMarkerOnlyOutput which accepts the marker only when it is essentially the entire output (allowing whitespace, trailing period, bullet, bold, or code-fence decoration), and route verdict.go and refine.go through it. Addresses review feedback on PR roborev-dev#635.
roborev: Combined Review (
|
Extended IsMarkerOnlyOutput to unwrap single * and _ italic markers in addition to the existing ** and __ bold forms, so agent output like *SEVERITY_THRESHOLD_MET* is correctly recognized as a below- threshold pass. Bold forms are stripped before italic to avoid degrading **X** into *X*. Updated the doc comment to spell out the accepted decoration forms and added test coverage for italic star, italic underscore, and bold-italic (***X***) in both config_test.go and verdict_test.go.
roborev: Combined Review (
|
modernize linter flagged the HasPrefix + TrimPrefix pair on the fenced code block strip path. Replace with CutPrefix to satisfy the rule.
roborev: Combined Review (
|
modernize linter flagged the strings.Index + slice pattern. Replace with strings.Cut to satisfy the rule.
roborev: Combined Review (
|
|
@wesm majority of the LoC here is plan markdown, happy to remove that if you want. |
Summary
min_severityconfig cascade (CLI flag → repo.roborev.toml→ globalconfig.toml) for review, refine, and fix workflowsmin_severityonreview_jobs(SQLite + PostgreSQL v11), thread through enqueue/claim/sync pathsconfig.SeverityInstructionSEVERITY_THRESHOLD_METmarker as a pass verdictreview_min_severity, synthesis uses[ci].min_severity; single-result CI runs route through synthesis when CI threshold is setformatSingleResultusesstorage.ParseVerdictto detect pass output (not loosestrings.Contains)GetJobsWithReviewsByIDshydration bug wheremin_severitywas scanned then overwritten by zero-value field