feat(harness): AI-native lifecycle architecture — governed agent collaboration#57
Merged
Conversation
…hrough the writer
…ndering overshoot)
…ced pull, idempotent lane recovery, empty-key dispatch, lane diagnostics, serialized Tick
…aces, receipt-recorded proposal re-mint, keyless receipt id, warn diagnostic
…(exactly-once ingest), surface warn reasons when a higher verdict wins
…outbox rows (no invalidation lease churn)
…le sink cursor (close S2/S7 crash-window)
…ASM rule module (Go encoder; wat2wasm absent)
…ounded, read-only)
…tion + shadow gate + deny-only edge)
…ckend (mnemon-control demo)
…n.go + .wasm were omitted from the demo commit)
…port section, build promoted rule from verified bytes, strip edge authored intent, recover wasm seat after deadline kill
…st span the whole section, not trust the count)
…t parser (no panic on hostile bytes)
…c at the Ingest wire boundary A *.proposed event is trusted by the reconciler and minted only by the bridge after the rule pre-gate + write-scope check (R11); dispatchOne skips reserved types, so a client- supplied *.proposed bypassed the rule pre-gate, the bridge write-scope, AND the S10 readback digest and was applied directly by the kernel (authz is actor x kind only). That let an edge write any resource of an authorized kind outside its dispatched scope — a within-kind cross- resource / cross-principal escalation (S9/D7). Ingest is the sole wire door (in-process + HTTP both route through it); reject reserved internal event types there before they enter the log.
…e S6 budget launder) kernel.Apply applied an op's writes sequentially with last-write-wins and had no distinct-ref guard, so two OpUpdates to the same ref in one 'all-or-nothing' op both CAS-succeeded and the decision reported two NewVersions for one resource. job.Reserve takes a caller-supplied dataWrite; aliasing it back to the budget ref (resetting spent_usd, based_on the post-reserve version) laundered the spend ceiling — real spend accumulated while stored spent_usd stayed 0, and the over-budget tripwire never fired. Multi-RESOURCE all-or-nothing (#5) means distinct resources; reject an op whose writes alias the same ref terminally up-front.
… + lane-recovery out-of-scope)
(1) dispatchOne's 'case VerdictPropose: if dec.Proposal == nil { break }' exited with no
diagnostic, asymmetric with the deny and enqueue_job/request_evidence nil-payload branches.
A rule emitting {Verdict:Propose, Proposal:nil} produced zero durable evidence — now diagnosed.
(2) runJobLane's crash-recovery remintFromReceipt swallowed a bridge.Stamp error and acked the
row, so an out-of-scope governed write recorded in a receipt was permanently lost with no
diagnostic — unlike the live lane path, which emits a stage:bridge diagnostic. Recovery now
mirrors it: an out-of-scope re-mint is diagnosed (stage:bridge), never silently dropped.
…row id (close S4 cross-job collision) The outbox-id namespaces are disjoint (job_k_+key vs job_s_+seq), but the receipt was keyed by the RAW idempotency key — reopening the collision one layer down: a keyless job keys its receipt by row id 'job_s_<seq>', and a keyed job whose payload-derived literal key == 'job_s_<seq>' forged that same receipt identity, so one of two distinct effects ran and the other was silently skipped as a duplicate (its governed proposal lost). Key the receipt/effect identity by the outbox ROW ID, which is disjoint by construction and still stable across a keyed retry (outbox UNIQUE(idempotency_key) yields one row per key). Keyed receipts move ev-job -> job_k_ev-job / gather-1 -> job_k_gather-1; keyless stay job_s_<seq>. Demo + dependent fixtures updated.
… (close S5 precision + unit) (MED#5) fence_until was stored as a float64; resource fields round-trip through json.Unmarshal into map[string]any, which decodes every JSON number to float64. At UnixNano magnitude (~1.78e18, ULP 256ns) the integer fence was silently mis-rounded, corrupting the active/expired boundary S5 rests on — an ACTIVE lease became foreign-stealable. Store fence_until as a decimal STRING (exact round-trip at any magnitude); asInt64 parses it. (MED#6) the mnemon-control demo wired a time.Now().UnixNano() lane clock with a raw ttl=60, while the outbox sibling uses time.Now().Unix()+ttl seconds — a unit mismatch that collapsed the lease fence to a 60-NANOSECOND window (~zero exclusion). The lane clock is seconds, matching the sibling (and within float64's exact-integer range).
…are conflict/version content (S8) Two false-clean vectors let Registry.Promote admit a candidate that diverges from the live policy: (HIGH) diffDecisions keyed decisions by OpID (= the client-controllable, non-unique Event.ID) with last-write-wins, so two decisions sharing an id collapsed to one and a candidate denying a write the live policy accepted reported Clean. Key by the durable IngestSeq (the event rowid, unique per decision). (MED) sameOutcome compared Conflicts/NewVersions by length only, so a candidate that re-derived a divergent conflict (different raced ref/version) or a different resulting version was reported equal. Compare the masked CONTENT element-wise (maskDynamic already sorts both slices).
…e-fn-of-input) The wasm seat instantiated the module ONCE in New and reused that instance across every Tick, so mutable guest globals + linear memory persisted between Evaluate calls — a gate-compliant module (imports only env.read_state_view) carrying a mutable global could return a different verdict for identical input, a non-deterministic rule that breaks replay/shadow soundness and opens a covert per-call channel. The import-section check cannot see globals and Manifest.Deterministic is read nowhere. Compile once in New; instantiate a FRESH anonymous instance per Evaluate (zeroing all guest state) and close it after — also making the seat structurally un-brickable by a deadline kill (the reinstantiate/retry dance is gone). Adds testdata/stateful.wasm (mutable-global flip) to prove it.
…d EventType (select-only) ResolveRules appended the raw registry rule, but RuleSet.Evaluate fires a rule on ANY type it Handles — so binding a rule only to memory.observed still let it fire on goal.observed if the rule also handled it, violating the select-only (define != select) model. Wrap each selected rule in a boundRule whose Handles returns true only for the bound EventType; identity/emits/evaluate delegate unchanged.
…idge misattribution The reducer dropped which rule produced the winning proposal, and the server guessed the producer by scanning for the first rule with matching Handles(ev.Type) && Emits()==proposal.Type. A rule could return another rule's emit type (or two rules could share handles/emits with different actors) and have the bridge stamp the WRONG actor as the trusted write identity. Now: the reducer (the common path for native AND wasm rules) rejects a proposal whose Type != the producing rule's declared Emits (empty defaults to Emits) with a diagnostic, and carries the producing rule's Actor on the decision (contract.RuleDecision.ProposalActor, json:"-" so an untrusted wasm rule cannot forge it). The server stamps the bridge binding from dec.ProposalActor; the proposerBinding Handles/Emits scan is removed.
…OBSERVED events (S8) The prior Shadow drove the candidate only over the existing *.proposed events of the log, but a rule handles OBSERVED events and EMITS proposals/denies/jobs — and rule bindings reject .proposed types — so the candidate's rules never fired: any candidate that changes observed->proposal/deny/job behavior passed report.Clean and was promotable (false-clean for every real rule change). The narrower IngestSeq/content-compare diff fixes operated inside that wrong model. Shadow now seeds a throwaway kernel with the canonical state (the logged proposals), then for each OBSERVED event evaluates BOTH policies against the same scoped view and diffs the rule decisions (verdict + proposal type+payload + job + trusted origin actor). The seed is never mutated (read-only). Replaces the kernel-decision diff machinery (diffDecisions/sameOutcome) which only existed for the old model; Replay (event-sourcing reproduce-from-log) is unchanged. Shadow gains a subs parameter; replay imports projection (compile-order-legal, no cycle).
…rule diagnostics (S8) (HIGH) Shadow applied ALL logged proposals then evaluated every observed event against the FINAL state, but the server evaluates rules at DISPATCH time — before that tick's reconcile. A version- sensitive candidate diverging at m1@1 but agreeing at m1@2 read false-clean. Walk the log in IngestSeq order on a throwaway kernel: apply each logged *.proposed to evolve state, and evaluate each OBSERVED event against the state built from the proposals that PRECEDE it (dispatch-time state). (MED) Shadow discarded the []Diagnostic from RuleSet.Evaluate. A candidate that errors or returns a borrowed-emit proposal reduces to Verdict allow but emits a durable diagnostic, comparing equal to live's clean allow → false-clean → diagnostics only after promotion. The canonical comparison now covers the decision AND the diagnostics.
The D-loop's event-model-definition kind. A loopdef candidate carries a SERIALIZED capability spec draft (a JSON string field), validated by a new closed-set validator and governed at the high-risk tier. - capability: `validate:capability-spec-draft` (closed validator catalog + compileDecode case) parses the draft, validates it COMPILES (FromSpec is pure — validate + discard = validate-only, no registration), and runs the external untrusted-text scan + identifier lock (I15 — a proposed event model is untrusted). An explicit single-layer recursion guard refuses a draft that is itself a loopdef or nests a spec-draft validator (FromSpec accepts the catalogued id, so the guard cannot live there — the M2 review correction). The draft is a JSON STRING, since compileDecode reads string fields (M1). - loopdef.json: `spec` field with the validator, `default_enabled` (so the propose channel is open out of the box, like coordination — solves the governance-only bootstrap) and `risk:high`. FromSpec hard-rejects any loopdef kind with risk≠high (S4/G2, non-overridable). So an OPERATOR (control-agent) admits a valid draft and the spec-draft validator denies an invalid/recursive one; an AGENT's loopdef candidate is denied (high-risk → Inbox). Pinned by capability unit tests (validator + S4) and app governance tests (operator admit/deny, agent denied). Catalog count, assembler payload, and the status FIELD line (loopdef=0) updated. Gates green: go build ./... ; go vet ./harness/... ; go test -race ./harness/... (21 ok / 0 fail) ; make harness-validate ; bash harness/scripts/e2e.sh (all legs + coordination).
… package (P3e-3)
The D-loop Δ2/G5 step: an admitted loopdef draft is written to a managed external
package under .mnemon/loops/<name>/, ready to be governed at the next reload.
- materializeLoopdefs/materializeDraft (app layer): reads the loopdef resource and
writes each admitted draft's capability.json (with default_enabled added — M3,
so reload governs it without an extra --loop) plus a .managed provenance marker
{materialized_by, version, digest}. Pure app-layer JSON — no capability internals.
- Wired into serveReproject (the DRIVER bridge): a loopdef accept's outbox
invalidation reaches the reproject callback, which materializes. The runtime
never touches the filesystem (runtime ↛ hostsurface preserved).
- WRITES ONLY — never activates: a materialized kind is governed only after an
explicit `mnemond reload` re-assembles the catalog (G1/G3, P3e-4).
- G5 isolation: a target dir without a .managed marker is a human-placed package
and is NEVER clobbered.
Pinned: a Go test admits a draft, materializes, and confirms the package is
default_enabled, carries the marker, and RESOLVES (ready for reload); a second
test confirms a human-placed package is left untouched.
Gates green: go build ./... ; go vet ./harness/... ;
go test -race ./harness/... (21 ok / 0 fail) ; make harness-validate ;
bash harness/scripts/e2e.sh (all legs).
The D-loop activation step (G1) and its audit (G4). - `mnemond reload`: a single verb that stops the recorded daemon, waits, then `up`s with the same flags — a RESTART that re-assembles the catalog, picking up any loopdef packages materialized since it started. It is the ONLY activation path: materialization writes to disk, reload activates (G1/G3). Never a Tick watch, never two shelled commands (the S5 review note); pre-flights the boot so a misconfigured project can't leave the daemon down. (No `local reload` — a foreground server has no second process to signal; S6.) - G4 activation ledger: on boot, emitLoopdefActivations records a durable `loopdef.activated.observed` event (via trusted intake under the well-known loopdef@local principal) for each materialized package present — one-time at boot, never on a Tick (G1), idempotent per (name, version, digest). The event carries no rule and writes no resource; it is the audit marker from which the active loopdef version across each reload is reconstructable. Pinned: a Go test materializes a draft, emits activations, and asserts exactly one event that does NOT duplicate on a second emit. mnemond reload smoke-tested (up → reload → status running → down). Gates green: go build ./... ; go vet ./harness/... ; go test -race ./harness/... (21 ok / 0 fail) ; make harness-validate ; bash harness/scripts/e2e.sh (all legs).
…e-5) The capstone of P3: the event model itself evolves under governance. An OPERATOR proposes a loopdef defining a NEW kind, it is admitted (high-risk, operator only), materialized, and — after an explicit reload — governed. - Extend the default-enabled boot grant to control-agents too: an operator governs the default kinds (proposes loopdefs, approves high-risk) just as a host-agent governs its own. - TestDLoopFullCycle (Go integration): operator proposes loopdef → admit → materialize → re-resolve+re-assemble (= reload) → the new widget2 kind is governed → the old loopdef resource survives the reload (one persistent store, I6). One robust, deterministic proof of the whole cycle. - run_dloop (e2e, product path): two setups (host-agent with memory to drive the background driver + control-agent operator), the operator proposes a loopdef (the spec draft carried as a JSON string), the driver materializes it, the new kind is NOT governable yet (G3: materialize != activate), then `mnemond reload` re-assembles the catalog and the new kind admits a candidate. This closes P3 (AgentTeam semantics + D-loop): the AgentTeam nouns exist, are default-on, risk-gated, surfaced in the tower seed — and "what nouns exist" is itself a governed, evolvable resource. PoC can reshape what the team SAYS, never how the system DECIDES (dynamic vocabulary × fixed grammar × governed evolution). Gates green: go build ./... ; go vet ./harness/... ; go test -race ./harness/... (21 ok / 0 fail) ; make harness-validate ; bash harness/scripts/e2e.sh (all legs + coordination + dloop).
…f / §577) The §577 generic append-merge that makes the AgentTeam coordination kinds syncable, closing P3's "三 kinds 全链路 (… 同步 …)" acceptance. - capability.itemDedupImport: a third closed-set merge strategy that merges a remote commit's items into the resource's item list BY ID, preserving EVERY item field. Unlike entry-dedup (memory's content) and declaration-dedup (skill's declarations), it makes no assumption about the item's domain fields, so an arbitrary declared kind syncs without losing them (assignment's scope/ttl/assignee). Item ids are replica-specific (actor+ingest_seq), so cross-replica items never collide; the merged header is re-derived from the capability's own render. decodeRemoteCommit is the kind-agnostic commit decode. - "item-dedup" added to the closed merge set + the importStrategy dispatch. - project_intent/assignment/progress_digest declare sync: item-dedup. loopdef stays NON-syncable (P3 single-machine D-loop; sync is P4). Pinned: a Go test imports an assignment commit and confirms scope/ttl/assignee survive (entry-dedup would lose them); run_sync_pair now syncs THREE kinds across the TLS hub (memory entry-dedup + journal entry-dedup + assignment item-dedup) — three descriptor-selected strategies, no kind literal in code. Updated the importable-kinds pin (now memory/skill + the three coordination kinds; loopdef asserted non-syncable). Gates green: go build ./... ; go vet ./harness/... ; go test -race ./harness/... (21 ok / 0 fail) ; make harness-validate ; bash harness/scripts/e2e.sh (all legs + sync-pair[memory+journal+assignment]).
Add the P4 context-budget tier as a closed set {hot,warm,digest-only} on
Subscription, resolved at one site (ResolveBudgetTier):
- empty resolves to hot (full) — budget is NOT a security axis (the grant
scope is), so the empty default PRESERVES existing full-delivery behavior,
mirroring ClampRefs's empty-requested=full-scope. (Folds adversarial
MUST-FIX 3: empty=digest-only would silently downgrade the coordination
kinds that already sync full today.)
- a non-catalogued value is rejected (fail-loud, never silently widened).
- the tier SELECTS a local rendering shape from a fixed catalog; it never
DEFINES behavior (A3/define!=select). It is applied where the local replica
derives its mirror — the hub is never tier-aware (no remote reducer, B1/B2).
Pure additive type + field + resolver + table tests; zero call sites changed.
Gates: go build ./... + go vet ./harness/... + go test -race ./harness/...
+ make harness-validate all green. e2e runs at the first wiring commit (P4b),
where behavior actually changes — P4a adds an unwired type.
The context-budget mechanism — a pure LOCAL presentation transform on the derived mirror, never a hub-side reducer (folds adversarial MUST-FIX 2: digest-only is computed locally over the full pulled set, the hub stays tier-unaware): - capability.ShapeByBudget(cap, fields, tier): keeps the most-recent K items (hot=all, warm=8, digest-only=1) and RE-RENDERS the capability header over the kept tail, so a content-rendered surface (the memory mirror reads the rendered `content`, not raw items) actually shrinks. Reducer-free by construction: a tier bounds the item COUNT, never a model summary (a true semantic digest is a sync-abi-v2 concern, deferred / B1). "Most-recent" = the list tail = local append/import order → an offline replay reshapes identically (B6). Non-item / unknown-tier / within-budget = exact passthrough (fail-open to full, never silent data loss). - app.budgetShapeProjection(proj, catalog, tier): walks a projection's per-resource Content through ShapeByBudget; Resources/Digest keep attesting the FULL authoritative scope (budget bounds context, not authority — the grant scope is the security boundary, B2); input never mutated. Pure mechanism + walker, exercised by unit tests on real capabilities and a real Projection; not yet on the serve path (P4c threads the catalog + the binding's tier into the mirror flow, where the e2e demonstration lands). Gates: build + vet + go test -race ./harness/... + make harness-validate green.
"接入点声明订阅" — the endpoint declares its subscription budget. Add ChannelBinding.Budget (contract.BudgetTier), parsed from the binding file's `budget` field and carried into the per-principal Subscription by SubsFromBindings. The closed-set guard runs at the binding boundary: ChannelBinding.Validate rejects an unknown tier (fail-loud), an omitted tier stays empty (= hot / full delivery — no silent downgrade, MUST-FIX 3). Additive field, still inert (nothing reads Budget until P4c-2 wires it into the mirror flow); existing empty-budget bindings resolve to hot, so behavior is unchanged. Gates: build + vet + go test -race ./harness/... + make harness-validate green. e2e runs at P4c-2 (the live mirror wiring).
Wire the budget mechanism live onto the serve path — the keystone connecting binding.Budget -> serveReproject -> budgetShapeProjection -> derived mirror: - serveReproject takes the boot-resolved catalog; mirrorPrincipal now returns the whole binding so the mirror site reads its declared Budget. - after PullProjection, the projection passes through budgetShapeProjection at the endpoint's tier before WriteMemoryMirror. A LOCAL presentation transform on what this host sees, never a hub-side reduction (I11 — local decides); the projection Digest still attests the full authoritative scope. hot/empty budget is exact passthrough, so existing bindings are unchanged — all e2e legs stay green. New integration test: a digest-only host-agent's MEMORY.md keeps only its most-recent entry; older entries drop. Gates: build + vet + go test -race ./harness/... + make harness-validate + bash harness/scripts/e2e.sh (all legs) green.
Strengthen the digest-only mirror test with the A4 hard-stop half: after the budgeted mirror shrinks to the newest entry, the authoritative (un-budgeted) projection still carries the full 3-entry set. Budget never reduced what was admitted/stored — it shapes only the local derived view. A4: remote/budget never bypasses or shrinks local authority; the grant scope is the security boundary, the tier is presentation only. Test-only (no production change). Gates: vet + go test -race ./harness/... green.
Add run_subscription to the e2e suite — the shell-level P4 acceptance
("packet 大小受预算约束"): a host endpoint declares budget=digest-only in its
binding; after three memory writes its derived MEMORY.md carries only the
newest entry (older ones dropped by the local budget transform, no hub-side
reduction), while the authoritative pull still reports the resource present
(budget bounds presentation, not authority — A4).
Gates: full e2e all legs green (now memory+skill+...+coordination+dloop+
subscription); build + vet + go test -race ./harness/... + harness-validate
unaffected (shell-only change).
Add Runtime.DecisionLedger() — the operator-wide, cross-actor decision-log read that backs the Control Tower's LEDGER (accepted) and INBOX (rejected) pages. It wraps Store.DecisionsAfter, opens NO write path, and is the one operator-scoped read the channel's per-actor PullProjection cannot serve, so the app-layer Tower facade reads it here rather than over the channel (the ui package will import only the facade — ui never touches the store). Folds adversarial MUST-FIX 3: FIELD/LEDGER are cross-actor operator reads that the per-actor channel API cannot serve, so the facade must hold the *Runtime. Pure additive read-only method + test (accepted decision surfaces with attribution; reads are idempotent). Gates: build + vet + go test -race green.
Add the app-layer Tower facade (TowerView) and BuildTowerView, assembling two of the four pages read-only from the *Runtime: - GOAL: project_intent statements (the goal) + progress_digest summaries. "readiness" is the ACTUAL progress entries, not a fabricated percentage — a % would need a KR data model that doesn't exist, and inventing one would be a new kernel concept (T1 veto, folded from the adversarial review). - LEDGER: accepted decisions with attribution (proposer + changed refs), via the read-only DecisionLedger. BuildTowerView performs only resource reads + the read-only ledger — never a write or a Tick (G10/T5). The ui package will render this and never touch the store (ui↛store). FIELD + INBOX land in P6a-3. Pure additive read-only facade + tests (an admitted project_intent shows on both pages; an empty runtime yields empty pages, no fabricated data). Gates: build + vet + go test -race ./harness/... green.
Complete the four-page TowerView read-side assembly: - FIELD: agents enumerated from the BindingSet (the only existing "who's on the field" source), live assignments (scope/assignee/lease TTL) from the assignment resource, and the open-escalation count. Deliberately NO "active/idle" liveness or event-rate — the data model has no heartbeat concept, and inventing one would be a new kernel concept (T1 veto). - INBOX: open escalations from the durable .diagnostic events (a denied or high-risk candidate surfaces here, never silently dropped); CausedBy links each to its triggering candidate — the re-observation target for P6b. BuildTowerView now also takes the BindingSet; still read-only (resource reads + event scan + DecisionLedger), no write or Tick (G10/T5). Test: a valid assignment lands on FIELD; a denied one (missing required scope) surfaces as an INBOX escalation; FIELD's count equals INBOX's. Gates: build + vet + go test -race ./harness/... green.
Add ReobserveCandidate — the Tower's ONLY write. It resolves an INBOX escalation by RE-OBSERVING the underlying candidate as the operator (a control-agent principal), the action that clears a high-risk operator-gate denial (RiskOperatorGate exempts the control-agent). It is NOT "approve a proposal" — no such kernel verb exists, and the wire rejects *.proposed/*.diagnostic (folds adversarial MUST-FIX 1+2). Mechanism: recover the ORIGINAL observed candidate from the durable event log by the escalation's CausedBy id, re-emit it through the SAME Ingest path under the operator (so the operator's binding governs it, G9). Fail loud if CausedBy does not name an OBSERVED candidate — the Tower never attempts a backdoor ingest of a trusted internal event. Idempotent per escalation. Test (reusing the high-risk approval kind): a host candidate denied by the operator gate surfaces on INBOX; the operator re-observes -> admitted with the ORIGINAL content; a missing candidate and a .diagnostic target are both refused. Gates: build + vet + go test -race ./harness/... green.
Add harness/internal/ui — the Control Tower's presentation layer. TowerModel is a PURE state machine over an app.TowerView snapshot: the active page (the four §3.3 pages), the INBOX cursor, and the read-side-dismissed escalations. All transitions are pure (copy-on-write ack set, no I/O, no *Runtime). - legality (T4) is enforced in the Model, not hidden in the View: an action illegal in the current state (Reobserve off INBOX, a cursor move on an empty list) is a pure no-op — never out-of-bounds, never a forbidden write. - the only write surfaces as a ReobserveIntent the command layer executes against a fresh view (the concurrency re-check); the Model never writes. - Render draws the active page under a tab bar naming all four pages; RenderAll is the headless --dump snapshot (every title + body). ui↛store verified mechanically (go list): the ui package directly imports ONLY the app facade — never store/kernel/runtime (T2). Tests: bounded page nav, render names all four pages, INBOX cursor + reobserve-intent + read-side dismiss, and the legality no-ops. Gates: build + vet + go test -race ./harness/... green.
A go test (so it runs in `go test ./harness/...` — the gates) pinning the Control Tower's sanctioned vocabulary: - the page names are EXACTLY the four §3.3 pages (GOAL/FIELD/INBOX/LEDGER), a closed set in order. - the Tower's STRUCTURAL vocabulary (an empty-data RenderAll — only the Tower's own labels, no injected resource content) contains no foreign-discipline jargon (OA/OKR/kanban/sprint/scrum/dashboard/backlog) and never the non-existent "human@owner" identity (the operator is a control-agent — MUST-FIX 2). Test-only. Gates: go test -race ./harness/internal/ui green.
Add `mnemon-harness tower` (D5: TUI-only, command name `tower`) — the human-visible four-page boundary over the agent field. It assembles the read-only TowerView (app facade) and renders all four §3.3 pages (GOAL/FIELD/INBOX/LEDGER) via the pure ui.TowerModel. READ-ONLY: it never writes or Ticks. It opens the local runtime directly (the facade needs cross-actor reads the per-actor channel cannot serve), so it requires the daemon STOPPED — single-writer, S11. `--dump` is the headless, scriptable snapshot. The interactive loop and the live-while-serving Tower (a channel read-verb or in-daemon rendering) are deferred to P5/operator — presentation only; all state lives in the pure TowerModel. e2e run_tower: the daemon admits a project_intent (GOAL) + an assignment (FIELD), the daemon stops, and `tower --dump` renders the four pages with the governed data. Gates: build + vet + go test -race ./harness/... + make harness-validate + bash harness/scripts/e2e.sh (all legs, now +tower) green.
The P8 quickstart — the one autonomously-buildable+verifiable beta-expression deliverable (directive §52c whitelists the minimal public docs): - Path A (operator): setup -> local run -> control observe (ticked=true) -> tower --dump shows the governed decision on LEDGER, attributed to its proposer. - Path B (capability author): drop .mnemon/loops/<name>/capability.json + enable via config.loops + the binding scope -> observe the new kind -> it governs through the SAME path as the built-ins (a capability SELECTS from a closed catalog; no per-kind code). Both command sequences were walked end to end and verified (ticked=true). Honors the claims discipline: no production-readiness claim, no systemd/k8s framing. The remaining P8 acceptance is operator-gated: the release/harness-beta-public branch + distribution (push), the real-newcomer timing leg, and claims verified against P5/P7 data. Docs-only; no code change.
A Local Mnemon runtime plus multiple Codex appserver workers writing governed observations through the channel, with a browser UI over the GOAL/FIELD/INBOX/ LEDGER activity and dynamic poc_claim/poc_decision protocol evolution via loopdef. Round-driven orchestration — the base the governed self-continuation loop builds alongside.
Add a content-blind nudge engine demonstrating "use a cluster like a single agent": from one seeded intent the cluster self-continues through governed events. Workers report; two POC agents route via governed assignment writes; the engine wakes whichever agent's projection digest changed. The "who acts next" decision lives only in a POC brain's governed assignment, never in the engine, and the whole chain is replayable from the decision ledger. The engine reuses already-exported framework surface (PullProjection, DecisionLedger, Ingest+Tick) through cmd-layer handle wrappers — no harness/internal changes. The --simulate brains are deterministic Go closures; a real-Codex brain is a drop-in with the same interface. Tests prove the one-hop chain, that routing lives in the brain (removing the POC brain breaks the chain), and the shipped 5-agent / 2-POC multi-hop demo plus its ledger-authoritative snapshot.
Add realCodexBrain, a drop-in agentBrain backed by a real Codex app-server turn (reusing codexRealAppServer). When nudged it runs a turn only when there is genuinely new work (a cheap relevance pre-check), then parses the model's output into a governed observation: a worker emits a progress_digest from its MNEMON_REPORT line; a POC emits an assignment from its MNEMON_ASSIGN/MNEMON_SCOPE lines, so the LLM — not the Go — decides who acts next. Wire it through --real-roles (per-role real/scripted substitution) and a headless --once mode. A real planner run self-continues the full ledger chain with the first hop authored by a live Codex turn; the engine and every other role are unchanged — the brain is the only swap. Output parsing and role substitution are unit-tested without spending a turn.
The worker developerInstructions hardcoded "Read-only sandbox: do not modify files" regardless of --codex-sandbox, so a workspaceWrite run still told the model not to write — silently blocking all file work. A live full-real-LLM run surfaced this precisely: the reviewer agent diagnosed the contradicting instruction by name. Derive the guidance line from the sandbox policy so the instruction never contradicts the sandbox.
…ngine Move the governed self-continuation engine out of cmd/package main into internal/autopilot — an OPTIONAL auto-drive layer over the channel. Base mnemon-harness integrates the channel and a human drives each agent by hand; engage the autopilot and that pacing is automated (it nudges a participant when its governed projection scope changes, looping to quiescence). It is content-blind — routing lives in the Agents, never here. Renamed for honesty: governedLoop->Loop, agentBrain->Agent, scriptedBrain-> Scripted, turnPacket->TurnPacket, nudgeEvent->Nudge; the 3-method cmd seam is now autopilot.Runtime (the in-process handle satisfies it). The package imports only channel/contract/projection and the channel core imports it ZERO times, so the ring is deletable — optionality is compiler-enforced. cmd is now a consumer.
Extract the Codex app-server JSON-RPC driver + output parsers into internal/codexapp — a reusable, stdlib-only "run a real Codex turn from Go" adapter with zero knowledge of governance, the autopilot, or any demo. The real-Codex Agent drives turns through it. Delete the old `codex-team` orchestrator-rounds demo (its command, codexTeamState, web UI, task specs, protocol-evolution, and the round-loop): the governed self-continuation demo (codex-team-loop) supersedes it. The few shared bits the new demo still needs (a slim in-process runtime handle, bindings, listener and string helpers) move to codex_team_host.go. cmd/mnemon-harness drops 5233 -> 2423 LOC; codexapp imports no internal package; full harness suite green; the scripted demo still self-continues end to end.
LocalConfig is only the field type for File's JSON "local" section and has no external callers; it also collided by name with the unrelated 8-field app.LocalConfig. Unexport it (localConfig) to signal "JSON-binding detail, not public API" and drop the false collision. Its exported fields still marshal and stay reachable through File.Local.
Add internal/coreguard: tests that the core (contract/channel/kernel/store/
projection/rule/reconcile/runtime) imports no outer ring (capability, hostsurface,
app, autopilot, codexapp, cmd, ...) and hardcodes no business kind string literal
(memory/skill/codex/loopdef/assignment/...). The kernel governance kinds
(lease/budget/receipt/coordination) are allowed; coordination is the one
grandfathered borderline case. A meta-test proves the matchers actually fire.
Also relocate the one real leak: LoopdefActivator ("loopdef@local") was a
loopdef-specific principal sitting in the generic contract core; move it to
app/loopdef_materialize.go where the loopdef machinery lives.
Makes "the core stays generic" a build gate, not a convention.
8edff02 to
3e6c6f3
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The consolidated AI-native lifecycle dev branch (293 commits over the v0.1.14 base
52fdd88), published for review and backup. It rebuilds mnemon's harness around a governed agent-collaboration model:harness/internal/{contract,channel,kernel,store,projection,rule,reconcile,runtime}): observe → propose → decide → project; kernel as the sole writer; scoped projections with content digests; durable, replayable decision ledger.project_intent/assignment/progress_digestkinds, a three-tier risk reducer, the Control Tower (GOAL/FIELD/INBOX/LEDGER), andloopdefprotocol evolution at lifecycle boundaries.harness/internal/autopilot): a content-blind, deletable ring that drives governed self-continuation ("use a cluster like a single agent"). Routing lives in the agents, never the engine; engage/disengage like an aircraft autopilot.harness/internal/codexapp): a stdlib-only adapter to drive real Codex turns. Thecodex-team-loopdemo runs scripted or real-LLM agents over the same engine.harness/internal/coreguard): a build gate that keeps the collaboration-channel core generic — no outer-ring imports, no business-kind string literals.This is the sole active development line (the prior remote branch was deleted during a history consolidation); the PR re-establishes a remote home for it.
Test Plan
go build ./...go test ./harness/...— all 23 packages greencodex-team-loopself-continues end to end (scripted, and real-LLM via Codex)bash harness/scripts/e2e.sh(codex + claude hosts) in CI