Skip to content

feat(ai): synthetic-defect harness for critic calibration (AI-044)#339

Merged
mrviduus merged 1 commit into
mainfrom
ai-044-critic-defect-harness
Jun 16, 2026
Merged

feat(ai): synthetic-defect harness for critic calibration (AI-044)#339
mrviduus merged 1 commit into
mainfrom
ai-044-critic-defect-harness

Conversation

@mrviduus

Copy link
Copy Markdown
Owner

AI-044 — Synthetic-defect injection harness (Phase 7)

Validates the AI-041 critic's calibration — the calibration AutoPublishCrew/SeoCrew gate publish on. Injects KNOWN defects into clean drafts, runs them through the real nano critic, and measures catch-rate + a clean-control false-positive rate.

How

Mirrors ToolCallEvalRunner: deterministic injection + scoring (pure, no LLM) → CI-testable with a fake critic, no key; the live nano run is admin-triggered (POST /admin/ai-quality/evals/criticdefects/run, ~23 calls, sync) and persists a criticdefects eval_run (no judge, Score=catch-rate, BreakdownJson per-axis + FP). Reuses EvalRun — no schema change.

Defect taxonomy (23 fixtures on a real edition-description brief)

Type n Axis "caught" =
factual_hallucination 6 factual_accuracy axis ≤2 / factual issue / ParseFailed
banned_phrase 4 banned_phrases axis ≤2 / banned issue / ParseFailed
length over/under 4 length length ≤2 / ParseFailed
tone_break 4 tone tone ≤2 / ParseFailed
clean (control) 5 flagged → false positive

ParseFailed (fail-closed verdict) counts as a correct reject for any defect.

Honest gate (hardened per adversarial QA)

  • Passed = catchRate ≥ 0.80 AND falsePositiveRate ≤ 0.20. The original catchRate-only gate let a flag-everything critic (FP=1.0) report success — the runner's own test now proves it correctly fails. A useless critic can't masquerade as calibrated.
  • Clean controls rewritten meta-phrase-free + grounded + in-bounds, so a legitimately strict critic isn't penalized with false FP.
  • Length defects breach by a wide margin (~½·Min / ~1.5·Max) so the breach is actually catchable by an LLM eyeballing prose — not a <1% margin that misses for reasons unrelated to critic quality.

Admin UI

New "Run critic-defect eval" button on the AI-quality Evals tab → catch% / FP% / n + PASS/FAIL badge; result persists into the eval history.

Tests — 15 (full AiEvals suite 41 pass / 5 live-key skip)

Pure injector transforms (breaches are real, deterministic) + runner scoring with fake critics: catches-all → fails (FP guard), catches-none → 0.0, good→pass, FP-just-over→fail, garbage→ParseFailed-caught. StudyBuddy set-equality green; no ITool leaked.

Verify

  • dotnet test tests/TextStack.AiEvals → 41 pass / 5 skip (deterministic half runs with no key)
  • dotnet test tests/TextStack.UnitTests → 402 pass
  • dotnet format --verify-no-changes → clean
  • pnpm -C apps/admin exec tsc --noEmit + build → clean

Note: FP-rate enforced now; golden set grows later (per RAG/StudyBuddy golden TODOs). Admin button is build-verified; live click is owner-triggered (needs prod key + admin session).

🤖 Generated with Claude Code

Injects KNOWN defects (hallucinated facts, banned phrases, wrong length,
tone breaks) into clean drafts, runs them through the real AI-041 critic
(nano), and measures catch-rate + clean-control false-positive rate —
validating the calibration AutoPublishCrew/SeoCrew gate publish on.

- Deterministic injector + scoring → CI-testable with a FAKE critic, no
  key; live nano run admin-triggered via POST /admin/ai-quality/evals/
  criticdefects/run, persists a criticdefects eval_run. Mirrors
  ToolCallEvalRunner (no judge, Score=catch-rate, BreakdownJson per-axis).
- Honest gate: Passed = catch-rate >= 0.80 AND false-positive <= 0.20 —
  a flag-everything critic (FP=1.0) correctly FAILS, not passes.
- 23 fixtures (factual x6, banned x4, length x4, tone x4, clean x5) on a
  real edition-description brief; clean controls neutral + grounded,
  length defects breach by a wide margin so an LLM can actually catch them.
- Admin Evals tab: Run critic-defect button → catch%/FP%/n + PASS/FAIL.

15 tests (injector + runner w/ fake critic, fail-closed + gate cases).
FP-rate enforced; golden grows later.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@mrviduus mrviduus merged commit e2367e4 into main Jun 16, 2026
5 checks passed
@mrviduus mrviduus deleted the ai-044-critic-defect-harness branch June 16, 2026 01:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant