Skip to content

feat(harness): AI-native lifecycle architecture — governed agent collaboration#57

Merged
Grivn merged 293 commits into
masterfrom
feat/ai-native-lifecycle-architecture
Jun 14, 2026
Merged

feat(harness): AI-native lifecycle architecture — governed agent collaboration#57
Grivn merged 293 commits into
masterfrom
feat/ai-native-lifecycle-architecture

Conversation

@Grivn

@Grivn Grivn commented Jun 14, 2026

Copy link
Copy Markdown
Member

Summary

The consolidated AI-native lifecycle dev branch (293 commits over the v0.1.14 base 52fdd88), published for review and backup. It rebuilds mnemon's harness around a governed agent-collaboration model:

  • Governed control plane (harness/internal/{contract,channel,kernel,store,projection,rule,reconcile,runtime}): observe → propose → decide → project; kernel as the sole writer; scoped projections with content digests; durable, replayable decision ledger.
  • AgentTeam coordination: project_intent / assignment / progress_digest kinds, a three-tier risk reducer, the Control Tower (GOAL/FIELD/INBOX/LEDGER), and loopdef protocol evolution at lifecycle boundaries.
  • Optional autopilot (harness/internal/autopilot): a content-blind, deletable ring that drives governed self-continuation ("use a cluster like a single agent"). Routing lives in the agents, never the engine; engage/disengage like an aircraft autopilot.
  • codexapp (harness/internal/codexapp): a stdlib-only adapter to drive real Codex turns. The codex-team-loop demo runs scripted or real-LLM agents over the same engine.
  • coreguard (harness/internal/coreguard): a build gate that keeps the collaboration-channel core generic — no outer-ring imports, no business-kind string literals.

This is the sole active development line (the prior remote branch was deleted during a history consolidation); the PR re-establishes a remote home for it.

Test Plan

  • go build ./...
  • go test ./harness/... — all 23 packages green
  • codex-team-loop self-continues end to end (scripted, and real-LLM via Codex)
  • bash harness/scripts/e2e.sh (codex + claude hosts) in CI

Grivn added 30 commits June 15, 2026 02:29
…ced pull, idempotent lane recovery, empty-key dispatch, lane diagnostics, serialized Tick
…aces, receipt-recorded proposal re-mint, keyless receipt id, warn diagnostic
…(exactly-once ingest), surface warn reasons when a higher verdict wins
…ASM rule module (Go encoder; wat2wasm absent)
…n.go + .wasm were omitted from the demo commit)
…port section, build promoted rule from verified bytes, strip edge authored intent, recover wasm seat after deadline kill
…st span the whole section, not trust the count)
…c at the Ingest wire boundary

A *.proposed event is trusted by the reconciler and minted only by the bridge after the
rule pre-gate + write-scope check (R11); dispatchOne skips reserved types, so a client-
supplied *.proposed bypassed the rule pre-gate, the bridge write-scope, AND the S10 readback
digest and was applied directly by the kernel (authz is actor x kind only). That let an edge
write any resource of an authorized kind outside its dispatched scope — a within-kind cross-
resource / cross-principal escalation (S9/D7). Ingest is the sole wire door (in-process + HTTP
both route through it); reject reserved internal event types there before they enter the log.
…e S6 budget launder)

kernel.Apply applied an op's writes sequentially with last-write-wins and had no distinct-ref
guard, so two OpUpdates to the same ref in one 'all-or-nothing' op both CAS-succeeded and the
decision reported two NewVersions for one resource. job.Reserve takes a caller-supplied
dataWrite; aliasing it back to the budget ref (resetting spent_usd, based_on the post-reserve
version) laundered the spend ceiling — real spend accumulated while stored spent_usd stayed 0,
and the over-budget tripwire never fired. Multi-RESOURCE all-or-nothing (#5) means distinct
resources; reject an op whose writes alias the same ref terminally up-front.
… + lane-recovery out-of-scope)

(1) dispatchOne's 'case VerdictPropose: if dec.Proposal == nil { break }' exited with no
    diagnostic, asymmetric with the deny and enqueue_job/request_evidence nil-payload branches.
    A rule emitting {Verdict:Propose, Proposal:nil} produced zero durable evidence — now diagnosed.
(2) runJobLane's crash-recovery remintFromReceipt swallowed a bridge.Stamp error and acked the
    row, so an out-of-scope governed write recorded in a receipt was permanently lost with no
    diagnostic — unlike the live lane path, which emits a stage:bridge diagnostic. Recovery now
    mirrors it: an out-of-scope re-mint is diagnosed (stage:bridge), never silently dropped.
…row id (close S4 cross-job collision)

The outbox-id namespaces are disjoint (job_k_+key vs job_s_+seq), but the receipt was keyed by
the RAW idempotency key — reopening the collision one layer down: a keyless job keys its receipt
by row id 'job_s_<seq>', and a keyed job whose payload-derived literal key == 'job_s_<seq>' forged
that same receipt identity, so one of two distinct effects ran and the other was silently skipped
as a duplicate (its governed proposal lost). Key the receipt/effect identity by the outbox ROW ID,
which is disjoint by construction and still stable across a keyed retry (outbox UNIQUE(idempotency_key)
yields one row per key). Keyed receipts move ev-job -> job_k_ev-job / gather-1 -> job_k_gather-1;
keyless stay job_s_<seq>. Demo + dependent fixtures updated.
… (close S5 precision + unit)

(MED#5) fence_until was stored as a float64; resource fields round-trip through json.Unmarshal
into map[string]any, which decodes every JSON number to float64. At UnixNano magnitude (~1.78e18,
ULP 256ns) the integer fence was silently mis-rounded, corrupting the active/expired boundary S5
rests on — an ACTIVE lease became foreign-stealable. Store fence_until as a decimal STRING (exact
round-trip at any magnitude); asInt64 parses it.

(MED#6) the mnemon-control demo wired a time.Now().UnixNano() lane clock with a raw ttl=60, while
the outbox sibling uses time.Now().Unix()+ttl seconds — a unit mismatch that collapsed the lease
fence to a 60-NANOSECOND window (~zero exclusion). The lane clock is seconds, matching the sibling
(and within float64's exact-integer range).
…are conflict/version content (S8)

Two false-clean vectors let Registry.Promote admit a candidate that diverges from the live policy:
(HIGH) diffDecisions keyed decisions by OpID (= the client-controllable, non-unique Event.ID) with
  last-write-wins, so two decisions sharing an id collapsed to one and a candidate denying a write the
  live policy accepted reported Clean. Key by the durable IngestSeq (the event rowid, unique per decision).
(MED) sameOutcome compared Conflicts/NewVersions by length only, so a candidate that re-derived a
  divergent conflict (different raced ref/version) or a different resulting version was reported equal.
  Compare the masked CONTENT element-wise (maskDynamic already sorts both slices).
…e-fn-of-input)

The wasm seat instantiated the module ONCE in New and reused that instance across every Tick, so
mutable guest globals + linear memory persisted between Evaluate calls — a gate-compliant module
(imports only env.read_state_view) carrying a mutable global could return a different verdict for
identical input, a non-deterministic rule that breaks replay/shadow soundness and opens a covert
per-call channel. The import-section check cannot see globals and Manifest.Deterministic is read
nowhere. Compile once in New; instantiate a FRESH anonymous instance per Evaluate (zeroing all guest
state) and close it after — also making the seat structurally un-brickable by a deadline kill (the
reinstantiate/retry dance is gone). Adds testdata/stateful.wasm (mutable-global flip) to prove it.
…d EventType (select-only)

ResolveRules appended the raw registry rule, but RuleSet.Evaluate fires a rule on ANY type it
Handles — so binding a rule only to memory.observed still let it fire on goal.observed if the rule
also handled it, violating the select-only (define != select) model. Wrap each selected rule in a
boundRule whose Handles returns true only for the bound EventType; identity/emits/evaluate delegate
unchanged.
…idge misattribution

The reducer dropped which rule produced the winning proposal, and the server guessed the producer
by scanning for the first rule with matching Handles(ev.Type) && Emits()==proposal.Type. A rule
could return another rule's emit type (or two rules could share handles/emits with different actors)
and have the bridge stamp the WRONG actor as the trusted write identity. Now: the reducer (the common
path for native AND wasm rules) rejects a proposal whose Type != the producing rule's declared Emits
(empty defaults to Emits) with a diagnostic, and carries the producing rule's Actor on the decision
(contract.RuleDecision.ProposalActor, json:"-" so an untrusted wasm rule cannot forge it). The server
stamps the bridge binding from dec.ProposalActor; the proposerBinding Handles/Emits scan is removed.
…OBSERVED events (S8)

The prior Shadow drove the candidate only over the existing *.proposed events of the log, but a
rule handles OBSERVED events and EMITS proposals/denies/jobs — and rule bindings reject .proposed
types — so the candidate's rules never fired: any candidate that changes observed->proposal/deny/job
behavior passed report.Clean and was promotable (false-clean for every real rule change). The
narrower IngestSeq/content-compare diff fixes operated inside that wrong model.

Shadow now seeds a throwaway kernel with the canonical state (the logged proposals), then for each
OBSERVED event evaluates BOTH policies against the same scoped view and diffs the rule decisions
(verdict + proposal type+payload + job + trusted origin actor). The seed is never mutated (read-only).
Replaces the kernel-decision diff machinery (diffDecisions/sameOutcome) which only existed for the old
model; Replay (event-sourcing reproduce-from-log) is unchanged. Shadow gains a subs parameter; replay
imports projection (compile-order-legal, no cycle).
…rule diagnostics (S8)

(HIGH) Shadow applied ALL logged proposals then evaluated every observed event against the FINAL
state, but the server evaluates rules at DISPATCH time — before that tick's reconcile. A version-
sensitive candidate diverging at m1@1 but agreeing at m1@2 read false-clean. Walk the log in
IngestSeq order on a throwaway kernel: apply each logged *.proposed to evolve state, and evaluate
each OBSERVED event against the state built from the proposals that PRECEDE it (dispatch-time state).

(MED) Shadow discarded the []Diagnostic from RuleSet.Evaluate. A candidate that errors or returns a
borrowed-emit proposal reduces to Verdict allow but emits a durable diagnostic, comparing equal to
live's clean allow → false-clean → diagnostics only after promotion. The canonical comparison now
covers the decision AND the diagnostics.
Grivn added 27 commits June 15, 2026 02:29
The D-loop's event-model-definition kind. A loopdef candidate carries a
SERIALIZED capability spec draft (a JSON string field), validated by a new
closed-set validator and governed at the high-risk tier.

- capability: `validate:capability-spec-draft` (closed validator catalog +
  compileDecode case) parses the draft, validates it COMPILES (FromSpec is pure
  — validate + discard = validate-only, no registration), and runs the external
  untrusted-text scan + identifier lock (I15 — a proposed event model is
  untrusted). An explicit single-layer recursion guard refuses a draft that is
  itself a loopdef or nests a spec-draft validator (FromSpec accepts the
  catalogued id, so the guard cannot live there — the M2 review correction).
  The draft is a JSON STRING, since compileDecode reads string fields (M1).
- loopdef.json: `spec` field with the validator, `default_enabled` (so the
  propose channel is open out of the box, like coordination — solves the
  governance-only bootstrap) and `risk:high`. FromSpec hard-rejects any loopdef
  kind with risk≠high (S4/G2, non-overridable).

So an OPERATOR (control-agent) admits a valid draft and the spec-draft validator
denies an invalid/recursive one; an AGENT's loopdef candidate is denied
(high-risk → Inbox). Pinned by capability unit tests (validator + S4) and app
governance tests (operator admit/deny, agent denied). Catalog count, assembler
payload, and the status FIELD line (loopdef=0) updated.

Gates green: go build ./... ; go vet ./harness/... ;
go test -race ./harness/... (21 ok / 0 fail) ; make harness-validate ;
bash harness/scripts/e2e.sh (all legs + coordination).
… package (P3e-3)

The D-loop Δ2/G5 step: an admitted loopdef draft is written to a managed external
package under .mnemon/loops/<name>/, ready to be governed at the next reload.

- materializeLoopdefs/materializeDraft (app layer): reads the loopdef resource and
  writes each admitted draft's capability.json (with default_enabled added — M3,
  so reload governs it without an extra --loop) plus a .managed provenance marker
  {materialized_by, version, digest}. Pure app-layer JSON — no capability internals.
- Wired into serveReproject (the DRIVER bridge): a loopdef accept's outbox
  invalidation reaches the reproject callback, which materializes. The runtime
  never touches the filesystem (runtime ↛ hostsurface preserved).
- WRITES ONLY — never activates: a materialized kind is governed only after an
  explicit `mnemond reload` re-assembles the catalog (G1/G3, P3e-4).
- G5 isolation: a target dir without a .managed marker is a human-placed package
  and is NEVER clobbered.

Pinned: a Go test admits a draft, materializes, and confirms the package is
default_enabled, carries the marker, and RESOLVES (ready for reload); a second
test confirms a human-placed package is left untouched.

Gates green: go build ./... ; go vet ./harness/... ;
go test -race ./harness/... (21 ok / 0 fail) ; make harness-validate ;
bash harness/scripts/e2e.sh (all legs).
The D-loop activation step (G1) and its audit (G4).

- `mnemond reload`: a single verb that stops the recorded daemon, waits, then
  `up`s with the same flags — a RESTART that re-assembles the catalog, picking up
  any loopdef packages materialized since it started. It is the ONLY activation
  path: materialization writes to disk, reload activates (G1/G3). Never a Tick
  watch, never two shelled commands (the S5 review note); pre-flights the boot so
  a misconfigured project can't leave the daemon down. (No `local reload` — a
  foreground server has no second process to signal; S6.)
- G4 activation ledger: on boot, emitLoopdefActivations records a durable
  `loopdef.activated.observed` event (via trusted intake under the well-known
  loopdef@local principal) for each materialized package present — one-time at
  boot, never on a Tick (G1), idempotent per (name, version, digest). The event
  carries no rule and writes no resource; it is the audit marker from which the
  active loopdef version across each reload is reconstructable.

Pinned: a Go test materializes a draft, emits activations, and asserts exactly
one event that does NOT duplicate on a second emit. mnemond reload smoke-tested
(up → reload → status running → down).

Gates green: go build ./... ; go vet ./harness/... ;
go test -race ./harness/... (21 ok / 0 fail) ; make harness-validate ;
bash harness/scripts/e2e.sh (all legs).
…e-5)

The capstone of P3: the event model itself evolves under governance. An OPERATOR
proposes a loopdef defining a NEW kind, it is admitted (high-risk, operator
only), materialized, and — after an explicit reload — governed.

- Extend the default-enabled boot grant to control-agents too: an operator
  governs the default kinds (proposes loopdefs, approves high-risk) just as a
  host-agent governs its own.
- TestDLoopFullCycle (Go integration): operator proposes loopdef → admit →
  materialize → re-resolve+re-assemble (= reload) → the new widget2 kind is
  governed → the old loopdef resource survives the reload (one persistent store,
  I6). One robust, deterministic proof of the whole cycle.
- run_dloop (e2e, product path): two setups (host-agent with memory to drive the
  background driver + control-agent operator), the operator proposes a loopdef
  (the spec draft carried as a JSON string), the driver materializes it, the new
  kind is NOT governable yet (G3: materialize != activate), then `mnemond reload`
  re-assembles the catalog and the new kind admits a candidate.

This closes P3 (AgentTeam semantics + D-loop): the AgentTeam nouns exist, are
default-on, risk-gated, surfaced in the tower seed — and "what nouns exist" is
itself a governed, evolvable resource. PoC can reshape what the team SAYS, never
how the system DECIDES (dynamic vocabulary × fixed grammar × governed evolution).

Gates green: go build ./... ; go vet ./harness/... ;
go test -race ./harness/... (21 ok / 0 fail) ; make harness-validate ;
bash harness/scripts/e2e.sh (all legs + coordination + dloop).
…f / §577)

The §577 generic append-merge that makes the AgentTeam coordination kinds
syncable, closing P3's "三 kinds 全链路 (… 同步 …)" acceptance.

- capability.itemDedupImport: a third closed-set merge strategy that merges a
  remote commit's items into the resource's item list BY ID, preserving EVERY
  item field. Unlike entry-dedup (memory's content) and declaration-dedup
  (skill's declarations), it makes no assumption about the item's domain fields,
  so an arbitrary declared kind syncs without losing them (assignment's
  scope/ttl/assignee). Item ids are replica-specific (actor+ingest_seq), so
  cross-replica items never collide; the merged header is re-derived from the
  capability's own render. decodeRemoteCommit is the kind-agnostic commit decode.
- "item-dedup" added to the closed merge set + the importStrategy dispatch.
- project_intent/assignment/progress_digest declare sync: item-dedup. loopdef
  stays NON-syncable (P3 single-machine D-loop; sync is P4).

Pinned: a Go test imports an assignment commit and confirms scope/ttl/assignee
survive (entry-dedup would lose them); run_sync_pair now syncs THREE kinds across
the TLS hub (memory entry-dedup + journal entry-dedup + assignment item-dedup) —
three descriptor-selected strategies, no kind literal in code. Updated the
importable-kinds pin (now memory/skill + the three coordination kinds; loopdef
asserted non-syncable).

Gates green: go build ./... ; go vet ./harness/... ;
go test -race ./harness/... (21 ok / 0 fail) ; make harness-validate ;
bash harness/scripts/e2e.sh (all legs + sync-pair[memory+journal+assignment]).
Add the P4 context-budget tier as a closed set {hot,warm,digest-only} on
Subscription, resolved at one site (ResolveBudgetTier):

- empty resolves to hot (full) — budget is NOT a security axis (the grant
  scope is), so the empty default PRESERVES existing full-delivery behavior,
  mirroring ClampRefs's empty-requested=full-scope. (Folds adversarial
  MUST-FIX 3: empty=digest-only would silently downgrade the coordination
  kinds that already sync full today.)
- a non-catalogued value is rejected (fail-loud, never silently widened).
- the tier SELECTS a local rendering shape from a fixed catalog; it never
  DEFINES behavior (A3/define!=select). It is applied where the local replica
  derives its mirror — the hub is never tier-aware (no remote reducer, B1/B2).

Pure additive type + field + resolver + table tests; zero call sites changed.
Gates: go build ./... + go vet ./harness/... + go test -race ./harness/...
+ make harness-validate all green. e2e runs at the first wiring commit (P4b),
where behavior actually changes — P4a adds an unwired type.
The context-budget mechanism — a pure LOCAL presentation transform on the
derived mirror, never a hub-side reducer (folds adversarial MUST-FIX 2:
digest-only is computed locally over the full pulled set, the hub stays
tier-unaware):

- capability.ShapeByBudget(cap, fields, tier): keeps the most-recent K items
  (hot=all, warm=8, digest-only=1) and RE-RENDERS the capability header over
  the kept tail, so a content-rendered surface (the memory mirror reads the
  rendered `content`, not raw items) actually shrinks. Reducer-free by
  construction: a tier bounds the item COUNT, never a model summary (a true
  semantic digest is a sync-abi-v2 concern, deferred / B1). "Most-recent" =
  the list tail = local append/import order → an offline replay reshapes
  identically (B6). Non-item / unknown-tier / within-budget = exact
  passthrough (fail-open to full, never silent data loss).
- app.budgetShapeProjection(proj, catalog, tier): walks a projection's
  per-resource Content through ShapeByBudget; Resources/Digest keep attesting
  the FULL authoritative scope (budget bounds context, not authority — the
  grant scope is the security boundary, B2); input never mutated.

Pure mechanism + walker, exercised by unit tests on real capabilities and a
real Projection; not yet on the serve path (P4c threads the catalog + the
binding's tier into the mirror flow, where the e2e demonstration lands).
Gates: build + vet + go test -race ./harness/... + make harness-validate green.
"接入点声明订阅" — the endpoint declares its subscription budget. Add
ChannelBinding.Budget (contract.BudgetTier), parsed from the binding file's
`budget` field and carried into the per-principal Subscription by
SubsFromBindings. The closed-set guard runs at the binding boundary:
ChannelBinding.Validate rejects an unknown tier (fail-loud), an omitted tier
stays empty (= hot / full delivery — no silent downgrade, MUST-FIX 3).

Additive field, still inert (nothing reads Budget until P4c-2 wires it into
the mirror flow); existing empty-budget bindings resolve to hot, so behavior
is unchanged. Gates: build + vet + go test -race ./harness/... +
make harness-validate green. e2e runs at P4c-2 (the live mirror wiring).
Wire the budget mechanism live onto the serve path — the keystone connecting
binding.Budget -> serveReproject -> budgetShapeProjection -> derived mirror:

- serveReproject takes the boot-resolved catalog; mirrorPrincipal now returns
  the whole binding so the mirror site reads its declared Budget.
- after PullProjection, the projection passes through budgetShapeProjection at
  the endpoint's tier before WriteMemoryMirror. A LOCAL presentation transform
  on what this host sees, never a hub-side reduction (I11 — local decides); the
  projection Digest still attests the full authoritative scope.

hot/empty budget is exact passthrough, so existing bindings are unchanged — all
e2e legs stay green. New integration test: a digest-only host-agent's MEMORY.md
keeps only its most-recent entry; older entries drop.

Gates: build + vet + go test -race ./harness/... + make harness-validate +
bash harness/scripts/e2e.sh (all legs) green.
Strengthen the digest-only mirror test with the A4 hard-stop half: after the
budgeted mirror shrinks to the newest entry, the authoritative (un-budgeted)
projection still carries the full 3-entry set. Budget never reduced what was
admitted/stored — it shapes only the local derived view. A4: remote/budget
never bypasses or shrinks local authority; the grant scope is the security
boundary, the tier is presentation only.

Test-only (no production change). Gates: vet + go test -race ./harness/... green.
Add run_subscription to the e2e suite — the shell-level P4 acceptance
("packet 大小受预算约束"): a host endpoint declares budget=digest-only in its
binding; after three memory writes its derived MEMORY.md carries only the
newest entry (older ones dropped by the local budget transform, no hub-side
reduction), while the authoritative pull still reports the resource present
(budget bounds presentation, not authority — A4).

Gates: full e2e all legs green (now memory+skill+...+coordination+dloop+
subscription); build + vet + go test -race ./harness/... + harness-validate
unaffected (shell-only change).
Add Runtime.DecisionLedger() — the operator-wide, cross-actor decision-log
read that backs the Control Tower's LEDGER (accepted) and INBOX (rejected)
pages. It wraps Store.DecisionsAfter, opens NO write path, and is the one
operator-scoped read the channel's per-actor PullProjection cannot serve, so
the app-layer Tower facade reads it here rather than over the channel (the ui
package will import only the facade — ui never touches the store).

Folds adversarial MUST-FIX 3: FIELD/LEDGER are cross-actor operator reads that
the per-actor channel API cannot serve, so the facade must hold the *Runtime.

Pure additive read-only method + test (accepted decision surfaces with
attribution; reads are idempotent). Gates: build + vet + go test -race green.
Add the app-layer Tower facade (TowerView) and BuildTowerView, assembling two
of the four pages read-only from the *Runtime:

- GOAL: project_intent statements (the goal) + progress_digest summaries.
  "readiness" is the ACTUAL progress entries, not a fabricated percentage — a
  % would need a KR data model that doesn't exist, and inventing one would be
  a new kernel concept (T1 veto, folded from the adversarial review).
- LEDGER: accepted decisions with attribution (proposer + changed refs), via
  the read-only DecisionLedger.

BuildTowerView performs only resource reads + the read-only ledger — never a
write or a Tick (G10/T5). The ui package will render this and never touch the
store (ui↛store). FIELD + INBOX land in P6a-3.

Pure additive read-only facade + tests (an admitted project_intent shows on
both pages; an empty runtime yields empty pages, no fabricated data). Gates:
build + vet + go test -race ./harness/... green.
Complete the four-page TowerView read-side assembly:

- FIELD: agents enumerated from the BindingSet (the only existing "who's on
  the field" source), live assignments (scope/assignee/lease TTL) from the
  assignment resource, and the open-escalation count. Deliberately NO
  "active/idle" liveness or event-rate — the data model has no heartbeat
  concept, and inventing one would be a new kernel concept (T1 veto).
- INBOX: open escalations from the durable .diagnostic events (a denied or
  high-risk candidate surfaces here, never silently dropped); CausedBy links
  each to its triggering candidate — the re-observation target for P6b.

BuildTowerView now also takes the BindingSet; still read-only (resource reads
+ event scan + DecisionLedger), no write or Tick (G10/T5).

Test: a valid assignment lands on FIELD; a denied one (missing required scope)
surfaces as an INBOX escalation; FIELD's count equals INBOX's. Gates: build +
vet + go test -race ./harness/... green.
Add ReobserveCandidate — the Tower's ONLY write. It resolves an INBOX
escalation by RE-OBSERVING the underlying candidate as the operator (a
control-agent principal), the action that clears a high-risk operator-gate
denial (RiskOperatorGate exempts the control-agent). It is NOT "approve a
proposal" — no such kernel verb exists, and the wire rejects
*.proposed/*.diagnostic (folds adversarial MUST-FIX 1+2).

Mechanism: recover the ORIGINAL observed candidate from the durable event log
by the escalation's CausedBy id, re-emit it through the SAME Ingest path under
the operator (so the operator's binding governs it, G9). Fail loud if CausedBy
does not name an OBSERVED candidate — the Tower never attempts a backdoor
ingest of a trusted internal event. Idempotent per escalation.

Test (reusing the high-risk approval kind): a host candidate denied by the
operator gate surfaces on INBOX; the operator re-observes -> admitted with the
ORIGINAL content; a missing candidate and a .diagnostic target are both
refused. Gates: build + vet + go test -race ./harness/... green.
Add harness/internal/ui — the Control Tower's presentation layer. TowerModel is
a PURE state machine over an app.TowerView snapshot: the active page (the four
§3.3 pages), the INBOX cursor, and the read-side-dismissed escalations. All
transitions are pure (copy-on-write ack set, no I/O, no *Runtime).

- legality (T4) is enforced in the Model, not hidden in the View: an action
  illegal in the current state (Reobserve off INBOX, a cursor move on an empty
  list) is a pure no-op — never out-of-bounds, never a forbidden write.
- the only write surfaces as a ReobserveIntent the command layer executes
  against a fresh view (the concurrency re-check); the Model never writes.
- Render draws the active page under a tab bar naming all four pages; RenderAll
  is the headless --dump snapshot (every title + body).

ui↛store verified mechanically (go list): the ui package directly imports ONLY
the app facade — never store/kernel/runtime (T2). Tests: bounded page nav,
render names all four pages, INBOX cursor + reobserve-intent + read-side
dismiss, and the legality no-ops. Gates: build + vet + go test -race
./harness/... green.
A go test (so it runs in `go test ./harness/...` — the gates) pinning the
Control Tower's sanctioned vocabulary:

- the page names are EXACTLY the four §3.3 pages (GOAL/FIELD/INBOX/LEDGER), a
  closed set in order.
- the Tower's STRUCTURAL vocabulary (an empty-data RenderAll — only the Tower's
  own labels, no injected resource content) contains no foreign-discipline
  jargon (OA/OKR/kanban/sprint/scrum/dashboard/backlog) and never the
  non-existent "human@owner" identity (the operator is a control-agent —
  MUST-FIX 2).

Test-only. Gates: go test -race ./harness/internal/ui green.
Add `mnemon-harness tower` (D5: TUI-only, command name `tower`) — the
human-visible four-page boundary over the agent field. It assembles the
read-only TowerView (app facade) and renders all four §3.3 pages
(GOAL/FIELD/INBOX/LEDGER) via the pure ui.TowerModel.

READ-ONLY: it never writes or Ticks. It opens the local runtime directly (the
facade needs cross-actor reads the per-actor channel cannot serve), so it
requires the daemon STOPPED — single-writer, S11. `--dump` is the headless,
scriptable snapshot. The interactive loop and the live-while-serving Tower (a
channel read-verb or in-daemon rendering) are deferred to P5/operator —
presentation only; all state lives in the pure TowerModel.

e2e run_tower: the daemon admits a project_intent (GOAL) + an assignment
(FIELD), the daemon stops, and `tower --dump` renders the four pages with the
governed data. Gates: build + vet + go test -race ./harness/... +
make harness-validate + bash harness/scripts/e2e.sh (all legs, now +tower) green.
The P8 quickstart — the one autonomously-buildable+verifiable beta-expression
deliverable (directive §52c whitelists the minimal public docs):

- Path A (operator): setup -> local run -> control observe (ticked=true) ->
  tower --dump shows the governed decision on LEDGER, attributed to its proposer.
- Path B (capability author): drop .mnemon/loops/<name>/capability.json + enable
  via config.loops + the binding scope -> observe the new kind -> it governs
  through the SAME path as the built-ins (a capability SELECTS from a closed
  catalog; no per-kind code).

Both command sequences were walked end to end and verified (ticked=true). Honors
the claims discipline: no production-readiness claim, no systemd/k8s framing.

The remaining P8 acceptance is operator-gated: the release/harness-beta-public
branch + distribution (push), the real-newcomer timing leg, and claims verified
against P5/P7 data. Docs-only; no code change.
A Local Mnemon runtime plus multiple Codex appserver workers writing governed
observations through the channel, with a browser UI over the GOAL/FIELD/INBOX/
LEDGER activity and dynamic poc_claim/poc_decision protocol evolution via
loopdef. Round-driven orchestration — the base the governed self-continuation
loop builds alongside.
Add a content-blind nudge engine demonstrating "use a cluster like a single
agent": from one seeded intent the cluster self-continues through governed
events. Workers report; two POC agents route via governed assignment writes; the
engine wakes whichever agent's projection digest changed. The "who acts next"
decision lives only in a POC brain's governed assignment, never in the engine,
and the whole chain is replayable from the decision ledger.

The engine reuses already-exported framework surface (PullProjection,
DecisionLedger, Ingest+Tick) through cmd-layer handle wrappers — no
harness/internal changes. The --simulate brains are deterministic Go closures; a
real-Codex brain is a drop-in with the same interface.

Tests prove the one-hop chain, that routing lives in the brain (removing the POC
brain breaks the chain), and the shipped 5-agent / 2-POC multi-hop demo plus its
ledger-authoritative snapshot.
Add realCodexBrain, a drop-in agentBrain backed by a real Codex app-server turn
(reusing codexRealAppServer). When nudged it runs a turn only when there is
genuinely new work (a cheap relevance pre-check), then parses the model's output
into a governed observation: a worker emits a progress_digest from its
MNEMON_REPORT line; a POC emits an assignment from its MNEMON_ASSIGN/MNEMON_SCOPE
lines, so the LLM — not the Go — decides who acts next.

Wire it through --real-roles (per-role real/scripted substitution) and a headless
--once mode. A real planner run self-continues the full ledger chain with the
first hop authored by a live Codex turn; the engine and every other role are
unchanged — the brain is the only swap.

Output parsing and role substitution are unit-tested without spending a turn.
The worker developerInstructions hardcoded "Read-only sandbox: do not modify
files" regardless of --codex-sandbox, so a workspaceWrite run still told the
model not to write — silently blocking all file work. A live full-real-LLM run
surfaced this precisely: the reviewer agent diagnosed the contradicting
instruction by name. Derive the guidance line from the sandbox policy so the
instruction never contradicts the sandbox.
…ngine

Move the governed self-continuation engine out of cmd/package main into
internal/autopilot — an OPTIONAL auto-drive layer over the channel. Base
mnemon-harness integrates the channel and a human drives each agent by hand;
engage the autopilot and that pacing is automated (it nudges a participant when
its governed projection scope changes, looping to quiescence). It is
content-blind — routing lives in the Agents, never here.

Renamed for honesty: governedLoop->Loop, agentBrain->Agent, scriptedBrain->
Scripted, turnPacket->TurnPacket, nudgeEvent->Nudge; the 3-method cmd seam is now
autopilot.Runtime (the in-process handle satisfies it). The package imports only
channel/contract/projection and the channel core imports it ZERO times, so the
ring is deletable — optionality is compiler-enforced. cmd is now a consumer.
Extract the Codex app-server JSON-RPC driver + output parsers into
internal/codexapp — a reusable, stdlib-only "run a real Codex turn from Go"
adapter with zero knowledge of governance, the autopilot, or any demo. The
real-Codex Agent drives turns through it.

Delete the old `codex-team` orchestrator-rounds demo (its command, codexTeamState,
web UI, task specs, protocol-evolution, and the round-loop): the governed
self-continuation demo (codex-team-loop) supersedes it. The few shared bits the
new demo still needs (a slim in-process runtime handle, bindings, listener and
string helpers) move to codex_team_host.go.

cmd/mnemon-harness drops 5233 -> 2423 LOC; codexapp imports no internal package;
full harness suite green; the scripted demo still self-continues end to end.
LocalConfig is only the field type for File's JSON "local" section and has no
external callers; it also collided by name with the unrelated 8-field
app.LocalConfig. Unexport it (localConfig) to signal "JSON-binding detail, not
public API" and drop the false collision. Its exported fields still marshal and
stay reachable through File.Local.
Add internal/coreguard: tests that the core (contract/channel/kernel/store/
projection/rule/reconcile/runtime) imports no outer ring (capability, hostsurface,
app, autopilot, codexapp, cmd, ...) and hardcodes no business kind string literal
(memory/skill/codex/loopdef/assignment/...). The kernel governance kinds
(lease/budget/receipt/coordination) are allowed; coordination is the one
grandfathered borderline case. A meta-test proves the matchers actually fire.

Also relocate the one real leak: LoopdefActivator ("loopdef@local") was a
loopdef-specific principal sitting in the generic contract core; move it to
app/loopdef_materialize.go where the loopdef machinery lives.

Makes "the core stays generic" a build gate, not a convention.
@Grivn Grivn force-pushed the feat/ai-native-lifecycle-architecture branch from 8edff02 to 3e6c6f3 Compare June 14, 2026 18:30
@Grivn Grivn merged commit bdea56d into master Jun 14, 2026
1 check passed
@Grivn Grivn deleted the feat/ai-native-lifecycle-architecture branch June 14, 2026 18:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant