Deterministic, language-agnostic propagate via inventory + translator + consensus#36
Open
panayotovk wants to merge 2 commits into
Open
Deterministic, language-agnostic propagate via inventory + translator + consensus#36panayotovk wants to merge 2 commits into
panayotovk wants to merge 2 commits into
Conversation
added 2 commits
May 17, 2026 21:26
Rewrites the propagate skill to use the same inventory + translator +
consensus architecture as distill: K subagents produce structured
obligation-bridge inventories, language-agnostic scripts canonicalise,
merge by K-vote, and dispatch to a per-language backend (manifest +
name-policy + templates) which is loaded from the skill's backends/
directory.
The translator is byte-deterministic given fixed inputs. Bridge
ambiguity (where K subagents cannot converge on a single witness) is
surfaced as low-confidence stubs with candidate symbols, not silenced.
Stage C runs the backend's runner command (e.g. pytest, jest) and
emits a categorised propagation-report.md (pass / fail / error /
bridge-unresolved / infrastructure-gap).
Adds:
- scripts/canonicalize-obligations.mjs (multiset validation against
allium plan; deterministic disambiguation of duplicate obligation
IDs from overloaded spec rules)
- scripts/merge-obligations.mjs (K-vote consensus; per-field
modal voting; bridge-ambiguity surfaced as low confidence)
- scripts/obligations-to-tests.mjs (translator core + named
bridge_import transforms + 4-construct template renderer)
- scripts/run-suite.mjs (Stage C with pluggable
per-format adapters)
- skills/propagate/SKILL.md (rewritten as orchestrator;
code_root and spec_path are mechanically locked from the user
invocation so two runs on the same project use identical framing)
- skills/propagate/references/obligation-bridge-schema.md
- skills/propagate/references/backend-authoring-guide.md
Two reference backends prove the dispatcher works across (language ×
test framework × PBT framework) combinations. Each backend is a
self-contained directory under skills/propagate/backends/<id>/ with
no translator-side code changes required to add it.
Each backend consists of:
- manifest.json declares language, file extension, runner command,
report format, imports lists per test_kind, and the
named bridge_import transform that turns
<path>::<symbol> into an idiomatic import line.
- name-policy.json declares casing rules and the file/test name
patterns the canonicaliser applies.
- conventions.md human-readable guidance for Stage A subagents on
how to populate the bridge field for this language.
- templates/ six placeholder-driven templates (test-file,
assertion, pbt-property, state-machine,
stub-unresolved, fixture).
pytest+hypothesis:
- python_module bridge import (app/services.py::approve_claim ->
"from app.services import approve_claim")
- conftest fixture style
- pytest-junitxml runner adapter
jest+fastcheck:
- typescript_relative bridge import (rewrites <path> relative to the
test file's location, e.g. "../src/services/claim")
- in-file fixture style (factories declared next to tests)
- jest-json runner adapter
A third backend is the documented exercise in
references/backend-authoring-guide.md (no translator changes
required).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Follow-up to #35 (deterministic distill). Same inventory + translator + consensus architecture, applied to
propagate, with a backend-dispatch layer so the methodology lands language-agnostically. Two reference backends ship in this PR:pytest+hypothesis(Python) andjest+fastcheck(TypeScript). Adding a third backend is a documented exercise againstskills/propagate/references/backend-authoring-guide.mdwith no translator changes required.Why
propagatetoday is fully LLM-mediated and inherits all of pre-deterministic distill's variance, with worse consequences — its output (test files) gets committed to source and re-run forever. Every run produces a different test suite, compounding CI churn, review noise, and reviewer fatigue.Pipeline
The translator is byte-deterministic given fixed inputs (proven by repeated re-translation in development). LLM judgement enters only at Stage A; everything downstream is pure functions over JSON.
A/B results (real K=3 LLM runs, both fixtures)
insurance-claimsbuild-pipelineBaseline samples produced 15 unique file names across 3 runs against
insurance-claims, with only 6 appearing in all three. Even files with matching names differed by 50–100% in size (e.g.test_rules.pyis 10.5 KB vs 17.3 KB across samples). Onbuild-pipelinethe agreement dropped to 3 of 21 unique names, with 0 byte-identical matches.Experimental produces the same file set every run (100% set agreement on both fixtures). Byte-identity across two independent K=3 runs reflects pure consensus variance: 73% on TS where all three runs picked the same
code_rootframing; 59% on Python where one run picked a different framing (now fixed via a SKILL.md tightening that mechanically lockscode_root).How bridge ambiguity is surfaced
Where the K-vote can't converge on a single witnessing symbol, the merged inventory keeps the candidates and the translator emits a backend-idiomatic stub:
pytest.skip("bridge-unresolved")with the candidates in the docstring.test.skip("name [bridge-unresolved]", () => { ... })with the candidates in a comment block.Reviewers see ambiguity, not silence. On the two real A/B runs, 3 of 96 obligations on
insurance-claimsand 4 of 63 onbuild-pipelinelanded in bridge-unresolved — all genuinely-ambiguous cases (an invariant enforced across multiple rules, aRoutessurface aggregating many handlers, etc.).Wrong bridges that K-vote can't fix (all subagents agree on the wrong primary) are caught downstream by the type checker / test runner — e.g. on
build-pipeline, all three TS subagents agreedReceiveGithubPushEventlives insrc/routes.tswhen it's actually insrc/webhooks.ts;tscflags it as a missing export. The safety net works as designed.Adding a new backend
A backend is
skills/propagate/backends/<id>/with:manifest.json— language, file extension, runner command, report format, per-test_kind imports, namedbridge_import.transform.name-policy.json— casing rules and file/test name patterns.conventions.md— Stage-A subagent guidance for the symbol convention.templates/— six placeholder-driven templates.The translator has a small registry of named
bridge_importtransforms (python_module,typescript_relative,noop); addingproptest+cargo-test,rapid+go-test, etc. is a manifest + name-policy + 6 templates plus one transform entry. No edits to the translator core, the canonicaliser, or the merger are required.See
skills/propagate/references/backend-authoring-guide.mdfor the full contract.Reproducing the A/B numbers
(Eval harness scripts live in the sandbox repo, not in this plugin. They're available if helpful; see referenced commit messages.)
Known caveats
npx jestinstalled); reports are generated but don't include runtime outcomes.tsc --noEmitwas used as a parse-time check on the hand-validated subset and passes after a one-line: unknownannotation fix in the PBT template.allium planemits duplicateobligation_ids when the spec has overloaded rules (e.g.build-pipeline's tworule ReceiveGithubPushEventdeclarations). The canonicaliser handles this via deterministic disambiguation (__1/__2suffixes); a cleaner fix lives inallium planitself but isn't blocking.Files changed (25 files, +2812 / -180)
Two commits — review them independently if it helps:
propagate: byte-deterministic pipeline (schema, scripts, orchestrator)— 4 scripts, the rewritten SKILL.md, and two reference docs.propagate: pytest+hypothesis and jest+fastcheck reference backends— two backend directories with their templates.🤖 Generated with Claude Code