DO NOT MERGE — Verification autopilot and routing policy engine (experimental) by johnlindquist · Pull Request #15 · vercel/vercel-plugin

johnlindquist · 2026-03-28T06:24:02Z

DO NOT MERGE — Experimental exploration branch

Summary

Closed-loop learning system for skill routing: observe → verify → promote → recall.

Core subsystems

Verification ledger — append-only observation log tracking what Claude actually verifies across 4 boundary types (UI renders, client requests, server handlers, environment)
Verification plan + directives — computes "next action" suggestions from ledger state and hands them to subagents as environment-variable directives
Closed-loop feedback — PostToolUse observer matches executed commands against directives, persists observations, and exposes adherence via verify-plan --json
Verification signal broadening — observes Read, Grep, Glob, and Fetch results (not just Bash), with local provenance gating to block external fetch contamination

Routing policy engine

Policy ledger + compiler — learns from verification outcomes to boost/suppress skill routing decisions over time
Policy recall + attribution — route-scoped policy recall for prompt and pretooluse hooks, with precise causal credit attribution
Decision capsules + causality — first-class causal evidence on every routing decision (why matched, boosted, recalled, dropped)
Closure diagnostics — append-only capsules recording why observations did/didn't close policy gates

Learning & memory

Verified rule learning — learn CLI replays verified routing decisions and promotes them into project-scoped JSON rulebooks
Companion learning — persists companion skill pairs that consistently close verification gaps together
Playbook recall — multi-step verified workflows promoted and recalled as reusable procedures
Learned routing rulebooks — runtime ranking applies stable learned guidance from canonical project-scoped artifacts

Observability

Session diagnostics CLI — session-explain, routing-explain, decision-cat commands for inspecting routing decisions
Skill exclusion policy — unified exclusion rules with manifest parity tests

Stats

~43,250 lines added across 146 files
~50 new test suites (80+ total test files)
30 commits on explore/ideas

Test plan

bun test — full suite passes
bun run build — hooks + manifest + from-skills compile cleanly
bun run doctor — self-diagnosis passes
Review verification plan output: bun run src/commands/verify-plan.ts --json
Review learned rulebook output: bun run src/cli/learn.ts --help

Preserve a deterministic verification trail so troubleshooting guidance can advance from observed evidence instead of repeating generic next steps. Surface one ranked verification action and scoped context to the CLI, hooks, and subagents so agents can continue the same investigation with clear boundary coverage and less duplicated probing. Ploop-Iter: 1

Carry verification intent across subagent boundaries so agents can receive a deterministic next step and downstream hooks can confirm whether the requested verification actually happened. Keep verification planning resilient when cached state is missing or stale by recomputing from ledger data, and log fallback failures so runtime proof paths do not fail silently during debugging. Ploop-Iter: 2

Persist verification observations and expose adherence snapshots so the plugin can replan from real execution evidence instead of only static intent. This gives downstream agents a stable machine-readable view of whether the last verification action followed guidance, which makes the autopilot loop more reliable and easier to regress-test. Ploop-Iter: 3

Persist routing outcomes and exposures so skill selection can learn from verified boundary observations instead of relying only on static pattern weights. Align explain output and regression coverage with the runtime injector so the adaptive policy remains inspectable and deterministic as the routing ledger evolves. Ploop-Iter: 1

Prevent routing-policy learning from over-crediting skills across unrelated verification stories or routes. This keeps adaptive ranking tied to the active verification thread so future injections learn from relevant evidence instead of noisy session-wide matches. Harden session-scoped exposure storage so unusual session identifiers cannot leak into tmp filenames, and keep generated artifacts free of stray orphan test fixtures that would pollute validation and manifest state. Ploop-Iter: 2

Automated checkpoint commit. Ploop-Iter: 3

Make routing learning trustworthy enough to drive policy updates from replayed evidence instead of loosely correlated observations. This keeps route and story attribution honest, preserves enough trace detail to reconstruct routing decisions deterministically, and enables bounded policy tuning from session outcomes without teaching on ambiguous signals. Ploop-Iter: 4

Move verification handoff into a shared directive contract so top-level hooks and subagents resolve the same story, route, and next action across tool boundaries. Export deterministic verification env clearing and route-aware fallback behavior to prevent stale state from leaking between calls and to let PostToolUse close policy exposures even when command inference is incomplete. Add focused tests around banner export and directive-win closure so the verification loop stays stable as routing policy logic evolves. Ploop-Iter: 1

Keep runtime routing evidence trustworthy by preventing test-only skills\nfrom leaking into the generated manifest and by locking the\nverification directive handoff to a stable env contract.\n\nThese checks reduce false routing attribution, catch manifest/live-scan\ndrift before it reaches hook behavior, and make end-to-end verification\nresolution deterministic across SubagentStart and PostToolUse.\n\nPloop-Iter: 2

Align control-plane diagnostics with manifest generation so operators and agents see the same runtime truth instead of divergent live-scan results. This preserves trust in routing and doctor output when fixture skills exist and gives one place to inspect session state during verification work. Harden the new snapshot path so broken generated state degrades into actionable diagnostics instead of a command crash, which keeps debugging workflows usable when manifests drift or become malformed. Ploop-Iter: 3

Route-aware recall lets the injector reuse historically verified winners when pattern matching misses them, improving skill selection without weakening strict story and route attribution. Persisting exact-route, wildcard, and legacy policy buckets preserves backward compatibility while creating enough evidence to boost and recall skills conservatively across similar flows. Ploop-Iter: 1

Keep historically verified recall from overriding stronger live matches while preserving traceability and repeatable rebuild behavior. Restore the excluded test-skill provenance so manifest-backed diagnostics and session explain output continue to reflect the repo's exclusion policy instead of silently drifting after regeneration. Ploop-Iter: 2

Expose deterministic routing-diagnosis data at the decision edge so operators and downstream agents can understand why route-scoped recall did or did not fire without changing routing behavior. Extend session-explain with the same additive diagnosis surface to make recent routing outcomes inspectable in CI and local debugging, which reduces guesswork when policy history, precedence, or missing signal affect recall. Ploop-Iter: 3

Keep verification evidence and next-action planning isolated per story so multi-route sessions do not contaminate each other. Align downstream consumers with the active-story projection and make session auto-detection prefer the freshest ledger activity so routing policy recall, directives, and CLI output stay coherent. Ploop-Iter: 1

Prevent co-injected helper skills from distorting long-term routing policy so policy learning stays tied to the skill that actually drove an injection. This keeps verification outcomes and replay data fully observable while making future ranking decisions more causally accurate and stable.\n\nAlso surface manifest exclusion parity and directive-env fallback details in session diagnostics so operators can understand why a skill is absent and why a verification closure did or did not resolve.\n\nPloop-Iter: 2

Automated checkpoint commit. Ploop-Iter: 3

Prompt-time routing only becomes useful training data when it can be resolved against a concrete verification boundary. Binding prompt decisions to the plan's predicted boundary avoids stale exposures and prevents policy learning from synthetic none|none scenarios. Ploop-Iter: 4

Recover historically successful skills during prompt submission when direct prompt matching misses, so active verification flows can still surface the right help instead of failing closed on zero-match paths. Expose the recalled path in doctor, trace, and attribution output so operators can understand why a skill was injected and trust the policy loop when it promotes a proven route-scoped winner. Ploop-Iter: 5

Enable routing improvements to be promoted from verified evidence instead of relying on manual policy tuning. Deterministic replay gating keeps learned rules safe to adopt, and project-scoped session discovery prevents unrelated tmp artifacts from contaminating learning results across worktrees. Ploop-Iter: 1

Keep routing-policy as an observational ledger so learned promotion runs do not fabricate wins, exposures, or demotions into the evidence store. This preserves ground truth for later analysis while still producing a deterministic promotion artifact that downstream tooling can inspect and apply safely. Tighten replay gating around observed verification outcomes so pending placeholder traces from PreToolUse do not count as verified success. That keeps learn-mode promotion decisions aligned with the real verification lifecycle and prevents false positive promotions or regressions. Ploop-Iter: 2

Promote replay-verified routing decisions into a canonical project-scoped rulebook so runtime ranking can apply stable learned guidance and decision capsules can expose provenance without re-deriving routing state. Normalize demotion boosts to stored magnitudes so compiler-produced rulebooks preserve runtime precedence semantics and cannot invert a demotion into an accidental promotion. Ploop-Iter: 3

Verification guidance was too dependent on Bash-only evidence, which left useful observations from reads, grep, glob, and fetch operations invisible to the planner. Normalizing those signals lets the verification loop keep state from more of the agent's actual behavior. The routing policy also needed stronger attribution discipline so soft evidence can inform plan progress without being over-credited as a successful verification outcome. Closing that gap protects route-scoped learning, prevents stale exposures from lingering silently, and gives the policy ledger cleaner data to learn from over time. Ploop-Iter: 1

Prevent external fetch observations from training routing policy and keep verification attribution tied to the story that actually owns the observed route. This makes oracle feedback more trustworthy and keeps gate telemetry machine-readable for regression coverage. Ploop-Iter: 2

Capture why PostToolUse verification observations do or do not close\nrouting policy so agents and developers can distinguish gate failures\nfrom zero-match exposure misses without replaying ledger state by hand.\n\nPersisting append-only closure capsules preserves negative-path receipts\nand makes the verification loop auditable, deterministic to test, and\nsafer to evolve as routing resolution logic becomes more nuanced.\n\nPloop-Iter: 3

Improve routing by preserving companion skill pairs that consistently close verification gaps, so repeated scenarios can surface complementary skills automatically instead of relearning the pairing every session. Keep companion evidence separate from single-skill policy credit and expose the learned artifact through the learn CLI and routing diagnostics so promotion decisions stay explainable, replayable, and safe to inspect. Also record recalled companions explicitly in decision traces so routing-explain and other observability surfaces can identify companion injections reliably. Ploop-Iter: 1

Expose companion recall in session diagnostics so operators can verify that verified-companion routing remains causal, synthetic, and subordinate to stronger direct or policy-driven matches. Locking these expectations into regression coverage reduces the risk of attribution drift, hidden policy contamination, and debugging blind spots as routing logic evolves. Ploop-Iter: 2

Persist first-class causal evidence in routing traces so diagnosis can explain why skills were matched, boosted, recalled, linked, or dropped without inferring intent from ranked order alone. Teach session-explain to prefer explicit causes and edges so operator output stays correct as routing grows more synthetic and route-scoped, while preserving safe fallback behavior for older traces. Ploop-Iter: 3

Melkeydev · 2026-03-28T17:45:21Z

Before reviewing - we need to find a better solution from only supporting Claude in this current repository. With Cursor and other harnesses, we need to find a way to support more generically, and not just Claude

Single-skill and pairwise memory were not preserving proven multi-step workflows, so repeated verification wins could not compound into reusable procedure. Persisting promoted playbooks and recalling them during injection lets the plugin reuse validated sequences, keeps learn output deterministic across JSON/text/write flows, and adds regression coverage so procedural memory stays accretive instead of speculative. Ploop-Iter: 1

Keep verified playbook recall deterministic and credit-safe so learned procedures improve guidance without distorting long-term routing policy. Threading the playbook banner and reason metadata through the hook output makes the contract inspectable, while forcing inserted steps to inherit the anchor skill for exposure attribution prevents context helpers from stealing policy wins or stale-miss credit from the originating skill. Ploop-Iter: 2

Ensure verified playbook metadata only reflects steps that were actually applied so banner output, exposure attribution, and causality stay consistent with runtime behavior. Add deterministic verification coverage for apply versus no-op paths and align the focused tests with the stricter contract to prevent regressions. Ploop-Iter: 3

johnlindquist added 18 commits March 26, 2026 19:04

ploop: iteration 3 checkpoint

5e546b0

Automated checkpoint commit. Ploop-Iter: 3

ploop: iteration 3 checkpoint

8dbd783

Automated checkpoint commit. Ploop-Iter: 3

johnlindquist added the experimental research Experimental research and exploration label Mar 28, 2026

johnlindquist added 9 commits March 27, 2026 23:58

johnlindquist added 2 commits March 28, 2026 11:51

johnlindquist changed the title ~~Verification autopilot and routing policy engine~~ DO NOT MERGE — Verification autopilot and routing policy engine (experimental) Mar 30, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DO NOT MERGE — Verification autopilot and routing policy engine (experimental)#15

DO NOT MERGE — Verification autopilot and routing policy engine (experimental)#15
johnlindquist wants to merge 30 commits intomainfrom
explore/ideas

johnlindquist commented Mar 28, 2026 •

edited

Loading

Uh oh!

Melkeydev commented Mar 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

johnlindquist commented Mar 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Core subsystems

Routing policy engine

Learning & memory

Observability

Stats

Test plan

Uh oh!

Melkeydev commented Mar 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

johnlindquist commented Mar 28, 2026 •

edited

Loading