diff --git a/docs/superpowers/plans/2026-05-22-personal-agent-harness.md b/docs/superpowers/plans/2026-05-22-personal-agent-harness.md new file mode 100644 index 00000000..824d4cd1 --- /dev/null +++ b/docs/superpowers/plans/2026-05-22-personal-agent-harness.md @@ -0,0 +1,643 @@ +# Personal Agent Harness Implementation Plan + +> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. + +**Goal:** Add a thin Personal Agent Harness around GenericAgent so sessions, runs, worktrees, capabilities, and memory promotion are durable and isolated without changing the minimalist agent loop. + +**Architecture:** Keep `agent_loop.py` and the core tool loop unchanged. Add focused harness modules for contracts, file-backed stores, worktree leases, capability manifests, and memory promotion gates, then integrate them gradually into `frontends/desktop_bridge.py` and later `frontends/conductor.py`. + +**Tech Stack:** Python standard library, JSON files, dataclasses, `unittest`, existing GenericAgent modules, Git CLI for worktree operations. + +--- + +## File Map + +- Create `harness/__init__.py`: marks the harness package and exports stable public symbols later. +- Create `harness/contracts.py`: dataclasses and JSON serialization for `SessionManifest`, `RunLedger`, `CapabilityManifest`, `WorktreeLease`, `MemoryPromotionRecord`, and `TaskBrief`. +- Create `harness/store.py`: atomic-ish JSON file store for sessions, runs, and promotion records. +- Create `harness/worktrees.py`: Git worktree lease allocation and cleanup helpers. +- Create `harness/capabilities.py`: read and validate capability manifest files. +- Create `harness/memory_promotion.py`: queue and approve/reject memory promotion records. +- Modify `frontends/desktop_bridge.py`: attach session manifests and run ledgers while preserving existing API behavior. +- Modify `frontends/conductor.py`: later reuse run ledger and subagent run status. +- Create `tests/harness/test_contracts.py`: serialization and default tests. +- Create `tests/harness/test_store.py`: file store behavior. +- Create `tests/harness/test_worktrees.py`: Git worktree command construction and dry-run behavior. +- Create `tests/harness/test_memory_promotion.py`: promotion gate behavior. +- Create `docs/personal-agent-harness.md`: user-facing summary once the first milestone is implemented. + +## Global Constraints + +- Do not modify `agent_loop.py` in the first milestone. +- Do not install dependencies. +- Do not require a daemon. +- Do not make GitHub Issues mandatory. +- Do not enable MCP or skills implicitly. +- Do not let normal runs write global memory directly. +- Do not commit unless the user explicitly authorizes a commit. + +## Task 1: Add Harness Contract Dataclasses + +**Files:** + +- Create: `harness/__init__.py` +- Create: `harness/contracts.py` +- Test: `tests/harness/test_contracts.py` + +- [ ] **Step 1: Write tests for contract round trips** + +Create `tests/harness/test_contracts.py` with `unittest` cases that instantiate each contract, convert it to dict, convert it back, and verify stable fields. + +Expected contracts: + +- `TaskBrief` +- `SessionManifest` +- `RunLedger` +- `CapabilityManifest` +- `WorktreeLease` +- `MemoryPromotionRecord` + +- [ ] **Step 2: Run tests and verify they fail because the package does not exist** + +Run: + +```bash +python3 -m unittest tests.harness.test_contracts -v +``` + +Expected: import failure for `harness.contracts`. + +- [ ] **Step 3: Implement `harness/contracts.py`** + +Use only dataclasses, `asdict`, and explicit `from_dict` constructors. Store timestamps as ISO-like strings supplied by callers or helper functions. Keep contract code pure and independent from GenericAgent. + +- [ ] **Step 4: Run contract tests** + +Run: + +```bash +python3 -m unittest tests.harness.test_contracts -v +``` + +Expected: all contract tests pass. + +- [ ] **Step 5: Run syntax check** + +Run: + +```bash +python3 -m compileall -q harness tests +``` + +Expected: no output and exit code 0. + +## Task 2: Add File-Backed Harness Store + +**Files:** + +- Modify: `harness/contracts.py` +- Create: `harness/store.py` +- Test: `tests/harness/test_store.py` + +- [ ] **Step 1: Write store tests** + +Cover: + +- creating a store at a temporary root +- saving and loading a `SessionManifest` +- appending and loading `RunLedger` records +- listing sessions +- rejecting path traversal session ids + +- [ ] **Step 2: Run tests and verify they fail** + +Run: + +```bash +python3 -m unittest tests.harness.test_store -v +``` + +Expected: import failure or missing methods. + +- [ ] **Step 3: Implement `HarnessStore`** + +Design: + +```text +/ + sessions//manifest.json + sessions//runs/.json + memory_promotions/.json +``` + +Use `tempfile.NamedTemporaryFile` plus `os.replace` for writes. Reject ids containing path separators, `..`, or empty strings. + +- [ ] **Step 4: Run store tests** + +Run: + +```bash +python3 -m unittest tests.harness.test_store -v +``` + +Expected: all store tests pass. + +## Task 3: Persist Desktop Session Manifests Without Behavior Change + +**Files:** + +- Modify: `frontends/desktop_bridge.py` +- Modify: `harness/store.py` +- Test: `tests/harness/test_desktop_bridge_store.py` + +- [ ] **Step 1: Write tests for session manifest creation** + +Use `AgentManager` with a temporary harness root. Call `create_session(cwd=tmpdir)` and assert: + +- existing HTTP snapshot fields still exist +- a matching `manifest.json` is saved +- `status` starts as `idle` +- `root_cwd` equals the session cwd + +- [ ] **Step 2: Run tests and verify failure** + +Run: + +```bash +python3 -m unittest tests.harness.test_desktop_bridge_store -v +``` + +Expected: missing harness integration. + +- [ ] **Step 3: Add optional `harness_root` to `AgentManager`** + +Keep default behavior if no harness root is provided: + +```text +/temp/harness +``` + +Do not change public HTTP routes. + +- [ ] **Step 4: Save manifests in `create_session` and status updates** + +When session status changes, update the manifest. Keep the existing in-memory `Session` object as the UI source for now. + +- [ ] **Step 5: Run regression tests** + +Run: + +```bash +python3 -m unittest tests.harness.test_desktop_bridge_store -v +python3 -m compileall -q harness frontends/desktop_bridge.py +``` + +Expected: pass. + +## Task 4: Add Run Ledger For Prompt Execution + +**Files:** + +- Modify: `frontends/desktop_bridge.py` +- Modify: `harness/store.py` +- Test: `tests/harness/test_run_ledger.py` + +- [ ] **Step 1: Write run ledger tests** + +Cover: + +- submitting a prompt creates a run record +- run status becomes `running` +- successful completion becomes `done` +- exceptions become `failed` with a failure reason +- cancellation becomes `cancelled` + +- [ ] **Step 2: Add run id to session execution** + +Generate `run_id` before starting `run_agent_turn`. Store it on the session and pass it into the worker thread. + +- [ ] **Step 3: Append run status changes** + +Persist a `RunLedger` at: + +```text +sessions//runs/.json +``` + +Keep the log path compatible with existing `temp/model_responses`. + +- [ ] **Step 4: Run ledger tests** + +Run: + +```bash +python3 -m unittest tests.harness.test_run_ledger -v +``` + +Expected: pass. + +## Task 5: Add Process Runtime For Development Sessions + +**Files:** + +- Create: `harness/runtime.py` +- Modify: `frontends/desktop_bridge.py` +- Test: `tests/harness/test_runtime.py` + +- [ ] **Step 1: Write runtime tests** + +Cover: + +- `ThreadRuntime` preserves current behavior. +- `ProcessRuntime` can launch a harmless Python command. +- `ProcessRuntime.cancel()` terminates a running process. +- stdout/stderr paths are recorded in the run ledger. + +- [ ] **Step 2: Implement runtime adapters** + +Create: + +- `ThreadRuntime` +- `ProcessRuntime` + +Both expose: + +```python +start(run_context) -> RuntimeHandle +cancel(handle) -> None +status(handle) -> str +``` + +The first integration should still default to `ThreadRuntime`. + +- [ ] **Step 3: Add session kind based runtime selection** + +For `kind == "development"`, allow `ProcessRuntime`. Keep normal chat sessions on `ThreadRuntime`. + +- [ ] **Step 4: Run runtime tests** + +Run: + +```bash +python3 -m unittest tests.harness.test_runtime -v +python3 -m compileall -q harness frontends/desktop_bridge.py +``` + +Expected: pass. + +## Task 6: Add Worktree Lease Manager + +**Files:** + +- Create: `harness/worktrees.py` +- Test: `tests/harness/test_worktrees.py` + +- [ ] **Step 1: Write dry-run worktree tests** + +Cover: + +- session worktree branch/path calculation +- execution worktree branch/path calculation +- invalid branch names are rejected +- dry-run mode returns commands without executing Git + +- [ ] **Step 2: Implement `WorktreeManager`** + +Inputs: + +- repo root +- worktree root +- branch template +- path template + +Methods: + +- `allocate_session_worktree(session_id, slug, base_ref)` +- `allocate_execution_worktree(session_id, run_id, slug, base_ref)` +- `release_lease(lease_id, cleanup=False)` + +- [ ] **Step 3: Add real Git smoke test guarded by temp repo** + +Use a temporary Git repo inside the test and run: + +```bash +git init +git config user.email test@example.com +git config user.name Test +``` + +Create one commit, allocate one worktree, and assert the path exists. + +- [ ] **Step 4: Run worktree tests** + +Run: + +```bash +python3 -m unittest tests.harness.test_worktrees -v +``` + +Expected: pass. + +## Task 7: Add Capability Manifest Reader + +**Files:** + +- Create: `harness/capabilities.py` +- Test: `tests/harness/test_capabilities.py` +- Optional later: create `.ai/capabilities.example.yml` + +- [ ] **Step 1: Write manifest tests** + +Cover capability types: + +- `tool` +- `mcp` +- `skill` +- `worker` +- `provider` + +Assert silent installation defaults to false and unverified capabilities are not enabled. + +- [ ] **Step 2: Implement manifest loading** + +Support YAML only if PyYAML is already installed. If not installed, support JSON first and keep YAML support behind a clear error. Do not add dependencies. + +- [ ] **Step 3: Add validation errors** + +Return structured validation errors for: + +- missing name +- missing type +- enabled without verification +- unknown permission + +- [ ] **Step 4: Run capability tests** + +Run: + +```bash +python3 -m unittest tests.harness.test_capabilities -v +``` + +Expected: pass. + +## Task 8: Add Memory Promotion Queue + +**Files:** + +- Create: `harness/memory_promotion.py` +- Modify: `harness/store.py` +- Test: `tests/harness/test_memory_promotion.py` + +- [ ] **Step 1: Write promotion tests** + +Cover: + +- proposed records are saved to the queue +- accepted records can be listed +- rejected records keep rejection reason +- records without evidence cannot be accepted +- global memory files are not modified by proposing a promotion + +- [ ] **Step 2: Implement queue operations** + +Methods: + +- `propose(record)` +- `accept(promotion_id, reviewer)` +- `reject(promotion_id, reviewer, reason)` +- `list_pending()` + +- [ ] **Step 3: Add a narrow adapter for existing memory writes** + +Do not change existing agent memory behavior yet. Add an adapter that can later replace direct global writes. + +- [ ] **Step 4: Run promotion tests** + +Run: + +```bash +python3 -m unittest tests.harness.test_memory_promotion -v +``` + +Expected: pass. + +## Task 9: Integrate Development Session Worktree Allocation + +**Files:** + +- Modify: `frontends/desktop_bridge.py` +- Modify: `harness/worktrees.py` +- Test: `tests/harness/test_development_session.py` + +- [ ] **Step 1: Write development session tests** + +Cover: + +- creating a `development` session allocates a session worktree +- ordinary chat sessions do not allocate worktrees +- existing main checkout is not written when worktree allocation is enabled +- worktree lease is persisted in the session manifest + +- [ ] **Step 2: Add optional session kind parameter** + +Extend session creation internals first. Public UI can keep default `chat`. + +- [ ] **Step 3: Add worktree allocation behind config** + +Default disabled until explicitly configured: + +```json +{ + "worktrees": { + "enabled": false, + "root": "temp/worktrees" + } +} +``` + +- [ ] **Step 4: Run development session tests** + +Run: + +```bash +python3 -m unittest tests.harness.test_development_session -v +``` + +Expected: pass. + +## Task 10: Add Lightweight Workflow Templates + +**Files:** + +- Create: `docs/workflows/personal-development-task.md` +- Create: `docs/workflows/memory-promotion.md` +- Create: `docs/workflows/capability-install.md` + +- [ ] **Step 1: Write development task workflow** + +Include: + +- task brief +- session worktree +- run ledger +- validation +- memory promotion proposal + +- [ ] **Step 2: Write memory promotion workflow** + +Include: + +- evidence requirement +- target layer decision +- acceptance/rejection +- no direct global write rule + +- [ ] **Step 3: Write capability install workflow** + +Include: + +```text +Inventory -> Gap report -> Approval -> Install -> Verify -> Enable +``` + +- [ ] **Step 4: Review docs for contradictions** + +Run: + +```bash +python3 - <<'PY' +from pathlib import Path + +patterns = [ + "TO" + "DO", + "TB" + "D", + "silent install " + "allowed", + "direct global memory write " + "allowed", + "build a full agent " + "os", +] +for path in [*Path("docs/superpowers").rglob("*.md"), *Path("docs/workflows").rglob("*.md")]: + text = path.read_text(encoding="utf-8", errors="replace") + lower = text.lower() + for pattern in patterns: + if pattern.lower() in lower: + print(f"{path}: contains {pattern}") +PY +``` + +Expected: no output. + +## Task 11: Add Conductor Run Ledger Integration + +**Files:** + +- Modify: `frontends/conductor.py` +- Test: `tests/harness/test_conductor_runs.py` + +- [ ] **Step 1: Write conductor run tests** + +Cover: + +- starting a subagent creates a run ledger +- subagent done updates status +- abort updates status +- chat history remains in conductor memory as before + +- [ ] **Step 2: Inject optional HarnessStore into conductor helpers** + +Keep current global behavior when store is absent. + +- [ ] **Step 3: Persist subagent lifecycle events** + +Map: + +- `running` +- `stopped` +- `failed` +- `aborted` + +to run ledger statuses. + +- [ ] **Step 4: Run conductor tests** + +Run: + +```bash +python3 -m unittest tests.harness.test_conductor_runs -v +``` + +Expected: pass. + +## Task 12: End-To-End Smoke Verification + +**Files:** + +- No new files required unless failures reveal missing tests. + +- [ ] **Step 1: Run unit tests** + +Run: + +```bash +python3 -m unittest discover -s tests -v +``` + +Expected: pass. + +- [ ] **Step 2: Run syntax check** + +Run: + +```bash +python3 -m compileall -q agent_loop.py agentmain.py ga.py hub.pyw launch.pyw llmcore.py simphtml.py TMWebDriver.py assets frontends ga_cli memory plugins reflect harness tests +``` + +Expected: pass. + +- [ ] **Step 3: Review diff** + +Run: + +```bash +git diff --stat +git diff -- docs/superpowers docs/workflows harness tests frontends/desktop_bridge.py frontends/conductor.py +``` + +Expected: changes match this plan and do not touch unrelated files. + +- [ ] **Step 4: Manual behavior check** + +Start the existing UI or bridge exactly as before and create a normal chat session. Confirm existing session creation and prompt submission still work. + +Record what was verified in the final implementation summary. + +## Milestone Order + +1. Tasks 1-2: contracts and store. +2. Tasks 3-4: desktop session manifest and run ledger. +3. Tasks 5-6: process runtime and worktree leases. +4. Tasks 7-8: capability and memory promotion gates. +5. Tasks 9-10: development session workflow. +6. Tasks 11-12: conductor integration and smoke verification. + +## Stop Conditions + +Stop and ask before proceeding if implementation requires: + +- changing `agent_loop.py` +- changing tool schemas +- installing new dependencies +- changing public API response structures +- writing to global memory automatically +- enabling MCP by default +- deleting worktrees or branches +- changing authentication, token, or credential behavior + +## Execution Choice + +This plan is ready for review. Implementation should start only after choosing one mode: + +1. Subagent-driven execution: one fresh worker per task, with review between tasks. +2. Inline execution: complete tasks in this session with checkpoints. + +No commit should be made unless explicitly authorized. diff --git a/docs/superpowers/specs/2026-05-22-personal-agent-harness-design.md b/docs/superpowers/specs/2026-05-22-personal-agent-harness-design.md new file mode 100644 index 00000000..1c07db8b --- /dev/null +++ b/docs/superpowers/specs/2026-05-22-personal-agent-harness-design.md @@ -0,0 +1,402 @@ +# Personal Agent Harness Design + +## Goal + +Build GenericAgent toward a personal assistant direction by adding a thin, reusable agent harness around the existing minimalist kernel. + +The goal is not to turn GenericAgent into a large agent OS. The goal is to keep the GenericAgent core valuable: + +- minimal loop +- skill crystallization +- layered memory +- token efficiency + +And gradually add the reliability features usually found in heavier systems: + +- session isolation +- recovery +- lightweight orchestration +- capability governance +- worktree isolation for development tasks +- memory promotion gates + +## Source Influences + +This design combines three families of ideas while keeping their boundaries separate. + +### GenericAgent + +GenericAgent remains the kernel model: + +- a small ReAct-style agent loop +- a small atomic toolset +- L1-L4 layered memory +- trajectory-to-skill crystallization +- low context budget and high information density + +The harness must not force large orchestration concepts into `agent_loop.py`. + +### FuXi / Hermes Workflow Design + +FuXi contributes boundary discipline, not a coupled implementation: + +- a controller owns workflow state +- a worker executes a bounded brief +- worktrees isolate code-writing tasks +- policies define concurrency, locks, risk, verification, and cleanup +- skills and MCP are installed capabilities, not implicit core behavior +- run records are the recoverable truth of a task execution + +FuXi's GitHub Issue / PR workflow is too heavy for the default personal assistant path. The useful part is the separation between control plane, execution worker, artifacts, and policy. + +### Pi-Style Minimal Runtime + +Pi contributes the preference for simple provider abstraction, small session runtime, and lightweight architecture. The harness should start with plain files, dataclasses, and narrow adapters before introducing databases, daemons, or distributed orchestration. + +## Core Principle + +Do not duplicate wheels, but do not let wheels decide the architecture. + +Adoption order: + +1. Reuse directly. +2. Wrap with an adapter. +3. Patch thinly. +4. Fork or vendor only when a stable boundary requires it. +5. Replace only when an existing wheel breaks core goals. + +Unified design comes from contracts, not from rewriting all implementations. + +## Layer Boundaries + +### Agent Kernel + +Owns: + +- LLM turn loop +- tool call dispatch +- per-turn summaries +- minimal action/result cycle + +Does not own: + +- session lifecycle +- worktree allocation +- MCP installation +- skill installation +- global memory promotion policy +- concurrency governance + +Existing anchor: `agent_loop.py`. + +### Session Runtime + +Owns one conversation or task thread: + +- session id +- default cwd +- active agent instance +- session history +- checkpoint +- status +- local run records +- session-scoped memory + +Does not own: + +- global scheduling policy +- cross-session locks +- global memory write decisions +- capability installation + +Existing anchors: `frontends/desktop_bridge.py`, `agentmain.GenericAgent`. + +### Capability Layer + +Owns installable and enableable capabilities: + +- built-in atomic tools +- MCP servers +- skills +- local scripts +- provider adapters +- external workers such as Codex + +Capabilities are not the workflow controller. They must declare: + +- name +- type +- source +- version or commit when applicable +- install scope +- permissions +- enabled sessions +- verification command or dry run + +MCP is an external tool interface. Skill is an external capability or reusable workflow fragment. Neither should become kernel logic. + +### Workflow Layer + +Owns task-level tool composition: + +- development task workflow +- research task workflow +- scheduled task workflow +- review workflow +- delivery workflow + +A workflow is a bounded recipe. It can call tools, skills, MCP, and workers, but it does not own process lifecycle or global policy. + +### Harness Layer + +Owns the control plane: + +- session registry +- run ledger +- cancellation +- recovery +- worktree leases +- resource locks +- capability enablement +- policy gates +- memory promotion queue +- verification routing + +The harness should remain thin. It manages lifecycle and boundaries; it does not reason through the user's task. + +## Core Contracts + +### TaskBrief + +A bounded execution contract for one run. + +Fields: + +- `goal` +- `scope` +- `non_goals` +- `inputs` +- `acceptance` +- `risk_level` +- `verification` +- `handoff_rules` + +For development tasks, FuXi's Agent Brief is the richer upstream pattern. The personal harness should start with a lighter TaskBrief that can later map to GitHub Issues or PRs when needed. + +### SessionManifest + +The durable identity of a session. + +Fields: + +- `session_id` +- `title` +- `kind`: `chat`, `development`, `research`, `scheduled`, `review` +- `created_at` +- `updated_at` +- `status` +- `root_cwd` +- `session_temp_dir` +- `session_memory_dir` +- `default_runtime` +- `capability_scope` +- `worktree` +- `active_run_id` + +Session manifests make sessions recoverable without turning the system into a daemon-first OS. + +### RunLedger + +The durable execution record for one task run. + +Fields: + +- `run_id` +- `session_id` +- `brief_id` +- `runtime` +- `cwd` +- `status` +- `pid` +- `started_at` +- `updated_at` +- `ended_at` +- `log_path` +- `artifacts` +- `verification` +- `failure_reason` +- `next_action` + +RunLedger is the source of recovery and debugging. It should be append-friendly and human-readable. + +### CapabilityManifest + +The inventory and policy for tools, MCP servers, skills, and workers. + +Fields: + +- `name` +- `type` +- `source` +- `install_scope` +- `enabled` +- `permissions` +- `verification` +- `owner` +- `notes` + +No silent installation. New capabilities follow: + +```text +Inventory -> Gap report -> Approval -> Install -> Verify -> Enable +``` + +### WorktreeLease + +A bounded file-system lease for development tasks. + +Fields: + +- `lease_id` +- `session_id` +- `run_id` +- `kind`: `session` or `execution` +- `path` +- `branch` +- `base_ref` +- `mode`: `read` or `write` +- `lock_keys` +- `created_at` +- `status` + +Two levels are supported: + +- Session worktree: the main development workspace for a development session. +- Execution worktree: a temporary sandbox for parallel or high-risk attempts. + +Harness owns worktree allocation. Agents and external workers request leases; they do not invent paths. + +### MemoryPromotionRecord + +A proposal to move session-local learning into global memory or a reusable skill. + +Fields: + +- `promotion_id` +- `session_id` +- `run_id` +- `source_artifact` +- `target_layer`: `L1`, `L2`, `L3`, `L4`, or `skill` +- `claim` +- `evidence` +- `risk` +- `status`: `proposed`, `accepted`, `rejected`, `needs_review` + +Rule: no execution, no memory. Global memory is not written directly by ordinary runs. + +## Session And Memory Model + +Sessions are independent by default: + +- independent history +- independent checkpoint +- independent working memory +- independent temp directory +- independent run logs +- optional independent worktree + +Sessions may contribute to shared memory only through promotion: + +```text +session memory -> promotion queue -> harness review -> global L1/L2/L3/L4 or skill +``` + +This preserves long-term learning without memory pollution. + +## Concurrency Model + +The current code can run multiple `GenericAgent` instances in one Python process using threads. This is acceptable for short IO-heavy sessions, but not enough for safe parallel development. + +Default model: + +- Thread runtime: short chat, small research, lightweight tasks. +- Process runtime: long tasks, development tasks, worker tasks, recoverable tasks. + +Development model: + +```text +one development session = one session worktree +parallel or high-risk execution = one execution worktree +``` + +Same session, serial development: + +- use the session worktree + +Same session, parallel attempts: + +- create execution worktrees +- compare results +- merge or cherry-pick into the session worktree + +Multiple development sessions: + +- each has its own session worktree + +Protected areas use lock keys: + +- auth +- database-schema +- payments +- ci +- deployment +- data-deletion + +This is intentionally lighter than Hermes, but borrows FuXi's lock discipline. + +## Capability Governance + +Capabilities are external and pluggable. + +Rules: + +- MCP servers are external tool providers. +- Skills are installable capability modules or workflow fragments. +- Workflows compose capabilities. +- Harness decides whether a capability is enabled for a session. +- Kernel only sees mounted tools. + +When a wheel is insufficient: + +1. Identify whether the missing part is interface, lifecycle, permission, state, recovery, or verification. +2. Prefer adapter for interface mismatch. +3. Prefer harness wrapper for lifecycle and recovery. +4. Prefer policy wrapper for permission gaps. +5. Prefer schema and ledger for output control. +6. Patch or replace only when the capability itself is unreliable. + +## Non-Goals + +This design does not aim to: + +- build a full Hermes clone +- require GitHub Issue / PR for every personal task +- make MCP core to the kernel +- make skills globally enabled by default +- turn every run into a new worktree +- allow agents to silently install capabilities +- let sessions directly overwrite global memory +- add distributed orchestration before local recovery works + +## First Milestone + +The first milestone should establish durable boundaries without changing the agent loop: + +1. Add plain-file contracts for sessions, runs, capabilities, worktrees, and memory promotions. +2. Persist session manifests and run ledgers. +3. Integrate the desktop bridge with the manifest store without changing user-facing behavior. +4. Add a process runtime for development sessions. +5. Add worktree leases for development sessions. +6. Add a memory promotion queue before global memory writes. + +Each step must be independently testable and revertible.