From 8e1451fc3bb4ec8fb8168a913f1e3806402c8523 Mon Sep 17 00:00:00 2001 From: Test User Date: Mon, 29 Jun 2026 23:13:10 +0200 Subject: [PATCH] docs: define work-native continuity authority --- .gitignore | 1 + .work/.gitignore | 18 - .work/goal.md | 488 ------------------ .work/milestone/AUDIT.md | 69 --- .work/milestone/MILESTONE.md | 58 --- .work/milestone/ROADMAP.md | 27 - .../phases/01-work-bootstrap/01-EXECUTE.md | 21 - .../phases/01-work-bootstrap/01-PLAN.md | 29 -- .../phases/01-work-bootstrap/01-VERIFY.md | 21 - .../phases/02-graph-core/02-EXECUTE.md | 19 - .../milestone/phases/02-graph-core/02-PLAN.md | 28 - .../phases/02-graph-core/02-VERIFY.md | 18 - .../phases/03-next-router/03-EXECUTE.md | 22 - .../phases/03-next-router/03-PLAN.md | 29 -- .../phases/03-next-router/03-VERIFY.md | 19 - .../04-EXECUTE.md | 21 - .../04-questions-decisions-handoff/04-PLAN.md | 30 -- .../04-VERIFY.md | 20 - .../phases/05-dogfood-gap-loop/05-EXECUTE.md | 21 - .../phases/05-dogfood-gap-loop/05-PLAN.md | 28 - .../phases/05-dogfood-gap-loop/05-VERIFY.md | 19 - .../06-harness-evals-trust/06-EXECUTE.md | 22 - .../phases/06-harness-evals-trust/06-PLAN.md | 29 -- .../06-harness-evals-trust/06-VERIFY.md | 21 - .../07-EXECUTE.md | 54 -- .../07-PLAN.md | 311 ----------- .../07-VERIFY.md | 80 --- .../scratchpad/2026-06-21-pr-ship-loop.md | 169 ------ ...-20-long-term-agent-harness-consistency.md | 230 --------- distilled/DESIGN.md | 50 ++ distilled/EVIDENCE-INDEX.md | 11 + 31 files changed, 62 insertions(+), 1921 deletions(-) delete mode 100644 .work/.gitignore delete mode 100644 .work/goal.md delete mode 100644 .work/milestone/AUDIT.md delete mode 100644 .work/milestone/MILESTONE.md delete mode 100644 .work/milestone/ROADMAP.md delete mode 100644 .work/milestone/phases/01-work-bootstrap/01-EXECUTE.md delete mode 100644 .work/milestone/phases/01-work-bootstrap/01-PLAN.md delete mode 100644 .work/milestone/phases/01-work-bootstrap/01-VERIFY.md delete mode 100644 .work/milestone/phases/02-graph-core/02-EXECUTE.md delete mode 100644 .work/milestone/phases/02-graph-core/02-PLAN.md delete mode 100644 .work/milestone/phases/02-graph-core/02-VERIFY.md delete mode 100644 .work/milestone/phases/03-next-router/03-EXECUTE.md delete mode 100644 .work/milestone/phases/03-next-router/03-PLAN.md delete mode 100644 .work/milestone/phases/03-next-router/03-VERIFY.md delete mode 100644 .work/milestone/phases/04-questions-decisions-handoff/04-EXECUTE.md delete mode 100644 .work/milestone/phases/04-questions-decisions-handoff/04-PLAN.md delete mode 100644 .work/milestone/phases/04-questions-decisions-handoff/04-VERIFY.md delete mode 100644 .work/milestone/phases/05-dogfood-gap-loop/05-EXECUTE.md delete mode 100644 .work/milestone/phases/05-dogfood-gap-loop/05-PLAN.md delete mode 100644 .work/milestone/phases/05-dogfood-gap-loop/05-VERIFY.md delete mode 100644 .work/milestone/phases/06-harness-evals-trust/06-EXECUTE.md delete mode 100644 .work/milestone/phases/06-harness-evals-trust/06-PLAN.md delete mode 100644 .work/milestone/phases/06-harness-evals-trust/06-VERIFY.md delete mode 100644 .work/milestone/phases/07-easy-global-install-auto-mode/07-EXECUTE.md delete mode 100644 .work/milestone/phases/07-easy-global-install-auto-mode/07-PLAN.md delete mode 100644 .work/milestone/phases/07-easy-global-install-auto-mode/07-VERIFY.md delete mode 100644 .work/milestone/scratchpad/2026-06-21-pr-ship-loop.md delete mode 100644 .work/research/2026-06-20-long-term-agent-harness-consistency.md diff --git a/.gitignore b/.gitignore index 03aaa0c..6682fff 100644 --- a/.gitignore +++ b/.gitignore @@ -43,6 +43,7 @@ MILESTONES.md .codebase-context/ _bmad/ .cursor/ +.work/ # Local temp files (slides, previews, scratch) tmp/ diff --git a/.work/.gitignore b/.work/.gitignore deleted file mode 100644 index 451e31d..0000000 --- a/.work/.gitignore +++ /dev/null @@ -1,18 +0,0 @@ -# Workspine local runtime state -state.json -graph/events.jsonl -graph/index.json -questions/open.json -questions/answered.jsonl -evidence/manifest.json -focus/current.md -dogfood/*.md -handoff/current.md - -# Keep durable contract/research files trackable -!goal.md -!research/ -!research/** -!milestone/ -!milestone/** -!.gitignore diff --git a/.work/goal.md b/.work/goal.md deleted file mode 100644 index 3bb5af2..0000000 --- a/.work/goal.md +++ /dev/null @@ -1,488 +0,0 @@ -# Goal: `gsdd next` Continuity Milestone - -Date: 2026-06-20 -Status: planning -Canonical runtime directory: `.work/` -Legacy planning directory: `.planning/` - -## Objective - -Implement a new Workspine milestone that makes `gsdd next` the agent-facing continuity primitive. - -`gsdd next` answers one question: - -> Given the current goal, repo truth, milestone state, memory graph, open questions, evidence, and prior decisions, what is the next coherent action for the agent? - -The milestone must let the user frontload product decisions, leave the agent to work through the planning -> execution -> verification -> audit -> gap-fix loop, and return later to answer only the remaining questions that were genuinely blocked. - -## Product Thesis - -Workspine should become the control layer for serious agentic product work. - -Agents should be able to continue without rereading a month of raw transcripts, but they must not run as an unbounded autonomous loop. The product should preserve coherence by converting goals, decisions, questions, evidence, dogfood findings, and session learnings into a small local continuity graph. - -The user remains the product owner. `gsdd next` keeps asking for decisions only at meaningful gates. - -## Core User Story - -As a user, I can write or approve a milestone goal once, leave, and later let an agent run: - -```text -gsdd next -``` - -The agent receives a structured packet that says whether to ask the user, research, plan, execute, verify, audit, fix gaps, dogfood, pause, or complete. - -## Why Now - -Recent Workspine dogfooding exposed the same failure class repeatedly: - -- agents lose continuity after compaction or session boundaries -- high-value session lessons stay trapped in transcripts -- `.planning` carries too many meanings and is hard to evolve -- milestone truth can be absent while local plans still look authoritative -- verification can pass locally while milestone-level integration still has gaps -- browser/UI proof, global install proof, and runtime discovery need evidence-gated claims -- research/deepening work needs model/tool routing constraints recorded up front - -The next milestone should fix the operating system before adding more feature-specific proof machinery. - -## Current Repo Reality - -This checkout does not currently have canonical `.planning/SPEC.md`, `.planning/ROADMAP.md`, or `.planning/MILESTONES.md`. - -That matters. Existing `gsdd-new-milestone` requires those files and must fail closed when they are missing. Prior lessons already record this as a false-closure risk. - -Therefore this milestone must include a bootstrap path instead of pretending the old lifecycle truth exists. - -## Design Direction - -Use `.work/` as the new runtime state root. - -`.work` is not only context. It contains active work state: goals, graph events, decisions, questions, evidence manifests, focus packets, dogfood findings, and handoff material. - -Reserve `.context` for exported semantic context bundles, likely produced by `codebase-context` or another context provider. Workspine may consume `.context`, but `.work` owns continuity. - -## Non-Goals - -- Do not build an unbounded autonomous loop. -- Do not ingest raw Codex, Claude, Cursor, or other vendor transcripts by default. -- Do not commit private session memory, screenshots, traces, DOM dumps, or secrets. -- Do not turn Workspine into `codebase-context`. -- Do not replace existing `gsdd-plan`, `gsdd-execute`, `gsdd-verify`, or milestone audit workflows. -- Do not make Playwright MCP, Chrome DevTools MCP, or any single browser provider the default architecture. -- Do not require hosted memory infrastructure. -- Do not introduce SQLite, graph databases, vector databases, or MCP memory servers before the file-based graph shape is proven. -- Do not auto-spawn implementation or research subagents unless the runtime can enforce the required model/tool profile. - -## Architecture Boundaries - -Workspine owns: - -- active goal and milestone continuity -- decisions and open questions -- workflow routing -- evidence contracts -- verification/audit/gap-fix state -- dogfood capture -- privacy and publication posture for work artifacts - -`codebase-context` owns: - -- codebase semantic graph -- symbols, files, dependencies, and architecture facts -- repo-specific context retrieval -- codebase memory with freshness/provenance - -`ideaspine` owns: - -- raw idea staging -- cross-project incubation -- challenge-coin/research notes before they become Workspine product commitments - -## Proposed `.work/` Shape - -```text -.work/ - goal.md - state.json - graph/ - events.jsonl - index.json - decisions/ - *.md - questions/ - open.json - answered.jsonl - evidence/ - manifest.json - focus/ - current.md - dogfood/ - *.md - handoff/ - current.md -``` - -The graph starts as append-only JSONL plus a derived index. The append-only log is source of truth; the index is rebuildable. - -## Graph Model - -Minimum node types: - -- `goal` -- `milestone` -- `phase` -- `task` -- `decision` -- `question` -- `assumption` -- `evidence` -- `artifact` -- `dogfood_finding` -- `session_summary` -- `repo` -- `external_context` - -Minimum edge types: - -- `belongs_to` -- `blocks` -- `answers` -- `supports` -- `contradicts` -- `supersedes` -- `derived_from` -- `requires_decision` -- `verified_by` -- `deferred_to` -- `references` - -Every event must include: - -```json -{ - "id": "evt_...", - "created_at": "ISO-8601", - "actor": "user|agent|tool", - "type": "node_created|node_updated|edge_created|question_answered|decision_recorded|evidence_recorded", - "privacy": "public|repo|local_only|secret_risk", - "source": "chat|file|command|web|ideaspine|codebase-context|manual", - "payload": {} -} -``` - -## `gsdd next` Contract - -`gsdd next` is read-first and deterministic where possible. - -Inputs: - -- `.work/goal.md` -- `.work/state.json` -- `.work/graph/events.jsonl` -- `.work/graph/index.json` if present -- `.work/questions/open.json` -- `.work/evidence/manifest.json` -- `.work/handoff/current.md` -- legacy `.planning/` artifacts when present -- repo truth from `control-map` -- optional ideaspine pointers -- optional codebase-context provider output - -Outputs: - -```json -{ - "state": "ask_user|research|plan|execute|verify|audit|fix_gaps|dogfood|pause|blocked|complete", - "reason": "short explanation", - "confidence": "high|medium|low", - "next_command": "command or workflow name", - "requires_user": true, - "questions": [], - "constraints": [], - "evidence_required": [], - "artifacts_to_read": [], - "artifacts_to_write": [], - "privacy_notes": [] -} -``` - -Human-readable output should be concise and action-oriented. JSON output should be available through `--json`. - -## State Machine - -Allowed states: - -- `ask_user`: unresolved product/architecture question blocks coherent work -- `research`: current facts are stale, external, vendor-specific, or high risk -- `plan`: enough is known to create or revise a plan -- `execute`: a reviewed plan exists and has executable tasks -- `verify`: execution artifacts exist and need phase verification -- `audit`: all phases for a milestone are verified and milestone-level integration needs checking -- `fix_gaps`: audit or verification found unsatisfied requirements -- `dogfood`: work passed and should generate a short Workspine improvement finding -- `pause`: save handoff because work cannot safely continue in this run -- `blocked`: repeated blocker needs external input or state change -- `complete`: milestone closure criteria are satisfied - -`gsdd next` must not silently jump across human gates. - -## Human Decision Gates - -The user must explicitly approve: - -- milestone objective changes -- architecture boundary changes -- graph storage migration beyond JSONL/index files -- adding hosted services, vector databases, SQLite, or MCP memory servers -- committing local-only memory or session-derived artifacts -- running live vendor probes that require auth/quota -- launching or attaching to browser sessions that may expose private UI state -- widening a phase beyond its success criteria -- accepting audit gaps as deferred work -- declaring milestone complete - -The agent may proceed without asking for: - -- reading repo files -- producing focus packets -- drafting research summaries -- creating local `.work` state files -- adding tests for already-approved behavior -- fixing straightforward implementation bugs inside an approved plan -- rerunning verification commands -- appending dogfood findings after a pass - -## Upfront Product Questions - -These are the questions the user should answer before implementation if possible. Defaults are included so the agent can proceed if the user explicitly approves the defaults. - -1. Should `gsdd next` be read-only in v1? - - Default: yes. It routes and emits packets; it does not mutate except for optional local state refresh. - -2. Should `.work/` become canonical immediately? - - Default: yes for new continuity artifacts; `.planning/` remains legacy-compatible and readable. - -3. Should the first graph store be JSONL plus rebuildable index? - - Default: yes. - -4. Should `gsdd next` call existing workflows or only recommend them? - - Default: recommend only in v1; later `--run` may execute bounded workflows. - -5. Should session transcript extraction exist in this milestone? - - Default: no raw transcript extraction in phase 1; only manually supplied or summarized session notes. - -6. Should ideaspine integration be path-based first? - - Default: yes. Read selected files/pointers; do not ingest all of ideaspine. - -7. Should codebase-context integration be built now? - - Default: adapter interface only; live provider integration after `gsdd next` core is stable. - -8. Should browser proof be included in this milestone? - - Default: only as an evidence category and future provider constraint, not live browser implementation. - -9. Should this milestone bootstrap missing lifecycle truth? - - Default: yes. The milestone must either create the minimal Workspine-native lifecycle state or explicitly bridge old `.planning` workflows to `.work`. - -10. Should the agent be allowed to run plan -> execute -> verify -> audit -> fix-gaps repeatedly? - - Default: yes after the milestone goal and first plan are approved, but it must stop at the human gates listed above. - -## Research Requirements - -Research must be current as of the day the milestone is planned or implemented. - -Primary grounding: - -- `.work/research/2026-06-20-long-term-agent-harness-consistency.md` - -Research areas: - -- current Codex non-interactive execution, JSONL event streams, hooks, subagents, and memory support -- current Claude Code hooks, subagents, memory, and lifecycle events -- current MCP trust/security boundaries for tool and resource outputs -- current local-first graph/event-log patterns appropriate for CLI tools -- current privacy guidance for local agent memory and transcript-derived artifacts -- current Workspine codebase conventions and existing lifecycle/helper patterns -- current harness-engineering patterns for durable execution, human interrupts, trace/eval loops, structured repair loops, and agent-computer interfaces -- current agent benchmark failure modes for long-horizon reasoning, planning, instruction following, tool use, and environment interaction -- current agent runtime safety research for tool-call interception, lifecycle security, trust boundaries, memory poisoning, retrieval poisoning, and authorization confusion - -Model routing constraint: - -- Research/deepening subagents must use `gpt-5.4-high` where model selection is available. -- Do not use `gpt-5.5` for those research/deepening roles. -- If the runtime cannot enforce model selection, do not spawn those subagents. Emit research briefs instead and mark reduced assurance. - -## Milestone Requirements - -Draft requirement IDs for the future milestone setup: - -- [ ] **[NEXT-01]**: User can run `gsdd next` and receive a structured next-action packet derived from `.work`, repo truth, and legacy `.planning` when present. [Done-When: `gsdd next --json` returns one valid state, reason, confidence, next command, constraints, and evidence requirements against fixture repos.] - -- [ ] **[WORK-01]**: User can initialize or refresh `.work/` without destroying or rewriting `.planning/`. [Done-When: `.work/goal.md`, `state.json`, graph files, questions, evidence manifest, and handoff paths are created or validated idempotently.] - -- [ ] **[GRAPH-01]**: User has a local append-only continuity graph for goals, decisions, questions, evidence, dogfood findings, and external context pointers. [Done-When: graph events validate against schema, index rebuild is deterministic, and local-only/privacy fields are enforced.] - -- [ ] **[QUESTION-01]**: User can frontload decisions and return later to answer only unresolved questions. [Done-When: open questions are stored, answered questions append to history, and `gsdd next` routes to `ask_user` only for unresolved blocking questions.] - -- [ ] **[DECISION-01]**: User can record architecture/product decisions that later runs must honor. [Done-When: decisions are persisted, linked into the graph, surfaced in `gsdd next`, and supersession is explicit.] - -- [ ] **[FLOW-01]**: Agent can follow the plan -> execute -> verify -> audit -> fix-gaps loop using `gsdd next` as the routing layer. [Done-When: fixture states route correctly between plan, execute, verify, audit, fix_gaps, dogfood, pause, blocked, and complete.] - -- [ ] **[DOGFOOD-01]**: User can capture a short Workspine dogfood finding after a verified pass. [Done-When: `gsdd next` routes to `dogfood` after pass conditions and generated findings are bounded, local-first, and backlog-linkable.] - -- [ ] **[PRIVACY-01]**: User is protected from accidental publication of private memory or raw session evidence. [Done-When: local-only artifacts are marked, raw transcript ingestion is disabled by default, and publication checks fail closed.] - -- [ ] **[COMPAT-01]**: Existing `.planning` workflows continue to work while `.work` becomes the new continuity surface. [Done-When: tests prove `.planning` artifacts are read as legacy inputs and new continuity artifacts write under `.work`.] - -- [ ] **[EVAL-01]**: User can evaluate `gsdd next` routing against durable fixture states instead of trusting a single manual run. [Done-When: fixture evals cover every allowed state, record expected packet fields, and fail on unsupported state transitions.] - -- [ ] **[TRACE-01]**: User can reconstruct why `gsdd next` chose a state from durable trace-like events. [Done-When: every next-action packet records inputs considered, skipped inputs, decision reason, confidence, and graph/evidence event references.] - -- [ ] **[INTERRUPT-01]**: User decision gates behave like durable interrupts. [Done-When: blocking questions persist with IDs, payloads, default recommendations, and resume semantics; answered questions update the graph and route forward.] - -- [ ] **[TRUST-01]**: User is protected from unsafe tool/action escalation during long-running agent work. [Done-When: `gsdd next` identifies privileged boundary crossings and routes to `ask_user` or `blocked` before destructive, privacy-sensitive, live-vendor, browser, or publication actions.] - -## Candidate Phase Sequence - -### Phase 1: `.work` Bootstrap and Goal Contract - -Goal: Establish `.work` as the canonical continuity root without breaking `.planning`. - -Success criteria: - -1. `.work` structure can be initialized and validated idempotently. -2. Root `goal.md` points to `.work/goal.md`. -3. Legacy `.planning` absence is handled honestly. -4. Tests cover bootstrap, validation, and privacy defaults. - -### Phase 2: Continuity Graph Core - -Goal: Add append-only graph events, deterministic index rebuild, and schema validation. - -Success criteria: - -1. Graph event schema supports required node and edge types. -2. Index rebuild is deterministic and ignores invalid local-only publication. -3. Decisions, questions, evidence, and dogfood nodes can be represented. -4. Tests cover malformed events, supersession, blockers, and privacy fields. - -### Phase 3: `gsdd next` Read-Only Router - -Goal: Implement `gsdd next` as a read-only state router. - -Success criteria: - -1. `gsdd next --json` emits the structured packet contract. -2. Router reads `.work`, `control-map`, and legacy `.planning` when available. -3. Fixture states route to all allowed states. -4. Human gates prevent silent execution across approval boundaries. - -### Phase 4: Questions, Decisions, and Handoff - -Goal: Make user decision frontloading and return-later continuity work in practice. - -Success criteria: - -1. Open questions are persisted and routed. -2. Answered questions append to history and update graph links. -3. Decisions are persisted with supersession support. -4. Handoff material is generated from graph state, not chat memory. - -### Phase 5: Dogfood and Gap-Fix Loop - -Goal: Close the milestone loop with verification/audit/gap-fix/dogfood routing. - -Success criteria: - -1. Passed verification routes to dogfood or audit as appropriate. -2. Audit gaps route to fix_gaps. -3. Dogfood finding is bounded and links to backlog or ideaspine pointer. -4. Milestone complete is blocked until audit and human gate conditions are satisfied. - -### Phase 6: Harness Evals and Trust Boundaries - -Goal: Add the minimum evaluation and trust-boundary coverage needed for long-term consistency. - -Success criteria: - -1. Fixture evals cover all `gsdd next` states and important failure modes. -2. Structured review/audit findings are consumable by `fix_gaps`. -3. Next-action packets include trace references and skipped-input notes. -4. Human gates cover action risk, reversibility, privacy, and authority boundaries. - -## Verification Strategy - -Required test categories: - -- CLI contract tests for `gsdd next --json` -- fixture repo tests for every state transition -- `.work` bootstrap/idempotency tests -- graph schema and index rebuild tests -- privacy/local-only publication guard tests -- legacy `.planning` compatibility tests -- question/decision supersession tests -- dogfood routing tests -- no-raw-transcript default tests -- fixture evals for all `gsdd next` states -- structured review-to-repair handoff tests -- durable interrupt/resume tests -- trust-boundary routing tests for destructive, live-vendor, browser, MCP, and publication actions - -Manual verification: - -- Run `gsdd next` in this Workspine repo with missing `.planning/SPEC.md` and confirm it does not falsely claim a normal milestone lifecycle. -- Run `gsdd next` in a fixture with valid `.planning` and confirm legacy compatibility. -- Run a dogfood pass after verification and confirm the output is useful but bounded. - -## Audit Strategy - -Milestone audit must verify: - -- every requirement maps to at least one phase and one verification artifact -- `gsdd next` does not overclaim autonomy -- `.work` and `.planning` boundaries are coherent -- privacy gates fail closed -- graph state can reconstruct the current decision posture -- gap-fix routing is tested with failing fixtures -- dogfood output produces actionable Workspine improvement without bloating state - -If audit finds gaps, the agent should plan gap-closure phases and repeat execute -> verify -> audit. - -## Implementation Loop Contract - -After the user approves this goal and the first detailed plan, the agent should work autonomously through: - -```text -plan -> execute -> verify -> audit -> fix gaps -> verify -> audit -> dogfood -``` - -The agent must stop and ask only when: - -- a listed human decision gate is reached -- a blocker repeats and cannot be resolved from repo truth -- implementation would widen architecture beyond this goal -- privacy or publication status is ambiguous -- current research contradicts the plan - -The agent should not stop merely because the work is multi-step. - -## Immediate Next Step - -Convert this goal into proper milestone lifecycle truth. - -Because canonical `.planning/SPEC.md`, `.planning/ROADMAP.md`, and `.planning/MILESTONES.md` are currently missing, the next planning action must choose one of: - -1. Bootstrap a new Workspine-native lifecycle using `.work` as canonical state and `.planning` as legacy input. -2. Recreate the existing `.planning` lifecycle prerequisites first, then add `.work`. - -Default recommendation: choose option 1. This milestone is explicitly about moving beyond `.planning`, so `.work` should become canonical now while legacy workflows remain readable. - -## Open Issues - -- Exact CLI spelling for initialization: `gsdd work init`, `gsdd next --init`, or implicit bootstrap. -- Whether root `goal.md` remains a pointer forever or is removed after `.work` is standard. -- Whether `.work` should be committed by default or split into committed templates plus gitignored local state. -- Whether dogfood findings export to `../ideaspine` automatically or only by explicit command. -- Whether a minimal `.context` export should be generated from `.work` for external agents later. diff --git a/.work/milestone/AUDIT.md b/.work/milestone/AUDIT.md deleted file mode 100644 index e8f8a4c..0000000 --- a/.work/milestone/AUDIT.md +++ /dev/null @@ -1,69 +0,0 @@ -# Milestone Audit: `gsdd next` Continuity - -Date: 2026-06-20 -Status: passed after PR hardening and installability dogfood loops - -## Audit Basis - -- `.work/goal.md` -- `.work/research/2026-06-20-long-term-agent-harness-consistency.md` -- `bin/lib/work-context.mjs` -- `bin/lib/next.mjs` -- `tests/gsdd.next.test.cjs` -- Real repo `gsdd next` packets -- Full `rtk npm test` - -## Installability Follow-Up - -Phase 7 added `install --global --auto` as a detected-target global install path through the existing manifest-safe writer. It is covered by execute and verify packets plus init, install pressure, guard, full-suite, help, and package dry-run evidence. - -The Phase 7 hardening also fixed the downstream lifecycle authority clash found during dogfood: matching `.work/milestone` phases and checkpoints now pass through `work_milestone` preflight authority instead of being blocked by unrelated `.planning` phase numbers or planning drift. - -## Findings - -Passed: - -- `.work` is created idempotently and keeps mutable runtime state local by default. -- `gsdd next` fails honestly when legacy `.planning` milestone files are absent. -- Malformed graph and malformed questions block routing instead of being silently ignored. -- Trust gates are checked before state-derived execution routing. -- Human output now exposes evidence requirements, skipped inputs, and trace refs. -- Tests cover all allowed routing states and core persistence commands. - -Fixed during second harness pass: - -- Missing `open.json` is now surfaced as skipped input rather than silently considered. -- Malformed `open.json` blocks routing. -- Invalid decision privacy no longer writes a partial decision artifact. -- Human packet output is less opaque for agents. - -Fixed during third challenge loop: - -- `gsdd next` now treats `.work/milestone` as Workspine-native lifecycle truth. -- Source bootstrap `.work/.gitignore` now whitelists future `.work/milestone/**` files, not just the hand-edited local file. -- Real repo `gsdd next --json` no longer routes backward to "draft .work milestone plan" after the milestone roadmap, audit, and phase packets exist. -- Completion approval packets now tell the agent to review `.work/milestone/AUDIT.md`, `.work/milestone/ROADMAP.md`, and `.work/evidence/manifest.json`. -- Added a regression test proving a passed `.work/milestone` audit prevents backward routing to planning. - -Fixed during PR hardening loops: - -- `gsdd next` packets now include typed `next_action` values instead of relying only on overloaded `next_command` strings. -- Captured stdout defaults to JSON through `--format auto`; `--format human` gives the compact supervisor card. -- Dirty worktree risk is reported as `repo_warnings`, not privacy notes. -- Question, decision, and dogfood mutation replays are retry-safe when content matches and fail closed when content differs unless `--replace` is explicit. -- Question answers and decision supersession are represented as explicit graph edges. -- `.work` projection writes use atomic replacement and graph/audit JSONL appends are fsynced before returning. -- Added durable PR scratchpad at `.work/milestone/scratchpad/2026-06-21-pr-ship-loop.md`. - -Residual hardening candidates: - -- Add full WAL-style operation commits with `op_id`, `prev_event_id`, `content_sha256`, and `op_committed`. -- Add deterministic golden packet fixtures with normalized volatile fields. -- Add consistency checks for orphaned side artifacts lacking committed graph events. -- Add package dry-run after every future durable file addition before release. - -## Closure - -The implemented local v1 is now more coherent with the current milestone because the router consumes the `.work/milestone` lifecycle surface it asked the agent to create, emits a typed agent action contract, and keeps the human supervisor surface compact. The remaining items are deeper durability/eval hardening follow-ups, not blockers for the scoped v1. - -The expanded milestone scope is closed for the local `.work` surface. `.planning` lifecycle drift is still reported, but matching `.work/milestone` packets are no longer governed by `.planning` phase ownership. diff --git a/.work/milestone/MILESTONE.md b/.work/milestone/MILESTONE.md deleted file mode 100644 index 724cbc7..0000000 --- a/.work/milestone/MILESTONE.md +++ /dev/null @@ -1,58 +0,0 @@ -# Milestone: `gsdd next` Continuity - -Date: 2026-06-20 -Status: implementation and verification complete for Phases 1-7 -Canonical goal: `.work/goal.md` -Research grounding: `.work/research/2026-06-20-long-term-agent-harness-consistency.md` - -## Objective - -Make `gsdd next` the agent-facing continuity primitive for Workspine. - -It must inspect `.work`, repo truth, legacy `.planning` where present, decisions, questions, evidence, and graph state, then emit the next coherent agent action without pretending to run an unbounded autonomous loop. - -## Scope - -In scope: - -- `.work` bootstrap and privacy defaults. -- Append-only local graph event log plus rebuildable index. -- `gsdd next` read-only routing and JSON/human packets. -- Question, decision, dogfood, and graph subcommands. -- Fixture-style routing coverage for all allowed states. -- Trust gate routing before execution-like states. -- Durable phase packets under `.work/milestone/phases`. -- Easy global install auto mode that routes through the existing `install --global` path. - -Out of scope: - -- Raw transcript ingestion. -- Hosted memory. -- SQLite, vector databases, graph databases, or MCP memory servers. -- Browser-provider implementation. -- Auto-running workflow commands from `gsdd next`. -- Remote installer manifests, handoff-file parsers, shell script execution, or unmanaged user-home writes. - -## Done-When - -- `gsdd next --init` creates `.work` idempotently. -- `gsdd next --json` returns one valid packet with traceable inputs, skipped inputs, constraints, and evidence requirements. -- Graph rebuild validates malformed events and fails closed. -- Blocking questions and decisions persist and influence routing. -- Evidence trust gates outrank optimistic state routing. -- Tests cover the route surface and representative failure modes. -- This milestone has per-phase plan, execute, and verify packets under `.work/milestone/phases`. -- `gsdd install --global --auto` lets a user install detected global agent surfaces in one command, while `--tools ` remains the explicit override for scoped installs. -- Lifecycle preflight treats matching `.work/milestone` phases as work-native authority, preventing branch-local phase packets from colliding with unrelated `.planning` milestones downstream. - -## Current Evidence - -- `rtk node --test --test-reporter=spec tests/gsdd.next.test.cjs`: passed. -- `rtk node --test --test-reporter=spec tests/gsdd.guards.test.cjs`: passed. -- `rtk npm test`: passed. -- `rtk npm pack --dry-run --json`: passed before the final durable phase packet addition. -- Phase 7 execute and verify packets prove the detected global install auto path, scoped `--tools` override, no-detection failure, work-native lifecycle preflight, guard tests, full suite, generated-helper behavior, and package dry-run. - -## Next Human Steering - -Run a final package dry-run after any additional durable milestone packet changes before treating this exact tree as release-ready. diff --git a/.work/milestone/ROADMAP.md b/.work/milestone/ROADMAP.md deleted file mode 100644 index b4c14b2..0000000 --- a/.work/milestone/ROADMAP.md +++ /dev/null @@ -1,27 +0,0 @@ -# `gsdd next` Continuity Roadmap - -Status: implemented locally, including the installability dogfood follow-up. - -## Phases - -- [x] **Phase 1: `.work` Bootstrap and Goal Contract** - establish `.work` as canonical continuity state while keeping `.planning` readable. -- [x] **Phase 2: Continuity Graph Core** - add append-only JSONL graph events and deterministic index rebuild. -- [x] **Phase 3: `gsdd next` Router** - emit next-action packets across all allowed states. -- [x] **Phase 4: Questions, Decisions, and Handoff** - persist durable interrupts, decisions, and return-later context. -- [x] **Phase 5: Dogfood and Gap-Fix Loop** - route verified/audited work through dogfood and fix-gap states. -- [x] **Phase 6: Harness Evals and Trust Boundaries** - cover fixture states, traceability, skipped inputs, and trust gates. -- [x] **Phase 7: Easy Global Install Auto Mode** - make `install --global --auto` the one-command detected global install path routed through the existing manifest-safe global installer, with `--tools` as the explicit override, and harden lifecycle preflight so `.work/milestone` packets do not collide with unrelated `.planning` phases. - -## Requirement Coverage - -- Phase 1: WORK-01, COMPAT-01, PRIVACY-01 -- Phase 2: GRAPH-01, TRACE-01, DECISION-01 -- Phase 3: NEXT-01, FLOW-01, EVAL-01 -- Phase 4: QUESTION-01, INTERRUPT-01, DECISION-01 -- Phase 5: DOGFOOD-01, FLOW-01 -- Phase 6: TRUST-01, EVAL-01, TRACE-01 -- Phase 7: INSTALL-AUTO-01, INSTALL-SAFETY-01, WORK-LIFECYCLE-01 - -## Closure Limit - -This roadmap proves the local file-backed `gsdd next` v1 plus the planned one-command global install auto mode. It does not claim transcript memory, hosted memory, provider-specific browser proof, automatic execution, URL-based installers, or unmanaged user-home writes. diff --git a/.work/milestone/phases/01-work-bootstrap/01-EXECUTE.md b/.work/milestone/phases/01-work-bootstrap/01-EXECUTE.md deleted file mode 100644 index 6b1b662..0000000 --- a/.work/milestone/phases/01-work-bootstrap/01-EXECUTE.md +++ /dev/null @@ -1,21 +0,0 @@ -# Phase 1 Execute Packet - -## Implemented - -- Added `.work` path discovery and bootstrap support in `bin/lib/work-context.mjs`. -- Added `gsdd next --init` in `bin/lib/next.mjs`. -- Added root `goal.md` pointer and canonical `.work/goal.md`. -- Added `.work/.gitignore` to keep mutable runtime state local-only. - -## Files - -- `bin/lib/work-context.mjs` -- `bin/lib/next.mjs` -- `.work/.gitignore` -- `.work/goal.md` -- `goal.md` - -## Verify - -- `rtk node --test --test-reporter=spec tests/gsdd.next.test.cjs` -- `rtk node bin/gsdd.mjs next --init --json` diff --git a/.work/milestone/phases/01-work-bootstrap/01-PLAN.md b/.work/milestone/phases/01-work-bootstrap/01-PLAN.md deleted file mode 100644 index 5d6d4e3..0000000 --- a/.work/milestone/phases/01-work-bootstrap/01-PLAN.md +++ /dev/null @@ -1,29 +0,0 @@ -# Phase 1 Plan: `.work` Bootstrap and Goal Contract - -## Goal - -Establish `.work` as the canonical continuity root without breaking legacy `.planning` reads. - -## Requirements - -- WORK-01 -- COMPAT-01 -- PRIVACY-01 - -## Tasks - -1. Add path helpers and idempotent bootstrap for `.work`. -2. Create durable goal/research tracking while keeping mutable runtime files local-only. -3. Route missing `.planning` files honestly instead of inferring old lifecycle state. -4. Add tests for bootstrap, idempotency, privacy defaults, and missing legacy lifecycle truth. - -## Evidence - -- `gsdd next --init --json` -- `.work/.gitignore` -- `tests/gsdd.next.test.cjs` - -## Boundaries - -- Do not recreate `.planning` as the canonical surface. -- Do not ingest raw transcripts. diff --git a/.work/milestone/phases/01-work-bootstrap/01-VERIFY.md b/.work/milestone/phases/01-work-bootstrap/01-VERIFY.md deleted file mode 100644 index dcaa4ab..0000000 --- a/.work/milestone/phases/01-work-bootstrap/01-VERIFY.md +++ /dev/null @@ -1,21 +0,0 @@ -# Phase 1 Verify Packet - -Status: passed - -## Checks - -- Bootstrap creates required `.work` directories and files. -- Re-running bootstrap is idempotent. -- Mutable runtime files are ignored by default. -- Durable contract/research/milestone files remain trackable. -- Missing `.planning/SPEC.md`, `.planning/ROADMAP.md`, and `.planning/MILESTONES.md` route to planning rather than false lifecycle progress. - -## Evidence - -- Focused `gsdd.next` tests passed. -- Real repo `gsdd next --init --json` passed. -- Real repo `gsdd next --json` surfaced skipped `.planning` inputs. - -## Remaining Risk - -None blocking for v1. diff --git a/.work/milestone/phases/02-graph-core/02-EXECUTE.md b/.work/milestone/phases/02-graph-core/02-EXECUTE.md deleted file mode 100644 index a39609d..0000000 --- a/.work/milestone/phases/02-graph-core/02-EXECUTE.md +++ /dev/null @@ -1,19 +0,0 @@ -# Phase 2 Execute Packet - -## Implemented - -- Added graph schema constants and validation in `bin/lib/work-context.mjs`. -- Added JSONL append and deterministic index rebuild. -- Added graph rebuild CLI path in `bin/lib/next.mjs`. -- Added malformed graph event routing. - -## Files - -- `bin/lib/work-context.mjs` -- `bin/lib/next.mjs` -- `tests/gsdd.next.test.cjs` - -## Verify - -- `rtk node bin/gsdd.mjs next graph rebuild --json` -- `rtk node --test --test-reporter=spec tests/gsdd.next.test.cjs` diff --git a/.work/milestone/phases/02-graph-core/02-PLAN.md b/.work/milestone/phases/02-graph-core/02-PLAN.md deleted file mode 100644 index 3c7b696..0000000 --- a/.work/milestone/phases/02-graph-core/02-PLAN.md +++ /dev/null @@ -1,28 +0,0 @@ -# Phase 2 Plan: Continuity Graph Core - -## Goal - -Add an append-only local graph event log and rebuildable index for continuity facts. - -## Requirements - -- GRAPH-01 -- TRACE-01 -- DECISION-01 - -## Tasks - -1. Define allowed event, node, edge, privacy, and source types. -2. Implement event append validation and deterministic index rebuild. -3. Connect decisions, questions, and dogfood findings to graph events. -4. Add tests for malformed events and rebuild behavior. - -## Evidence - -- `gsdd next graph rebuild --json` -- Graph-related tests in `tests/gsdd.next.test.cjs` - -## Boundaries - -- No SQLite, graph DB, vector DB, or hosted memory. -- Graph events are local-first and privacy-tagged. diff --git a/.work/milestone/phases/02-graph-core/02-VERIFY.md b/.work/milestone/phases/02-graph-core/02-VERIFY.md deleted file mode 100644 index 9acd072..0000000 --- a/.work/milestone/phases/02-graph-core/02-VERIFY.md +++ /dev/null @@ -1,18 +0,0 @@ -# Phase 2 Verify Packet - -Status: passed with follow-up hardening candidate - -## Checks - -- Valid events rebuild into index nodes and edges. -- Invalid graph events are reported and block normal routing. -- Packet trace refs cite graph/event context where present. - -## Evidence - -- Focused tests passed. -- Real repo graph rebuild passed with zero invalid events after implementation. - -## Remaining Risk - -Supersession and answers are recorded in event payloads, but explicit graph edge events for `supersedes` and `answers` are still a worthwhile hardening follow-up. diff --git a/.work/milestone/phases/03-next-router/03-EXECUTE.md b/.work/milestone/phases/03-next-router/03-EXECUTE.md deleted file mode 100644 index 7aacebe..0000000 --- a/.work/milestone/phases/03-next-router/03-EXECUTE.md +++ /dev/null @@ -1,22 +0,0 @@ -# Phase 3 Execute Packet - -## Implemented - -- Added `bin/lib/next.mjs` with packet builders and state router. -- Wired `next` into `bin/gsdd.mjs` and runtime help. -- Added JSON and human output modes. -- Added routing for `ask_user`, `research`, `plan`, `execute`, `verify`, `audit`, `fix_gaps`, `dogfood`, `pause`, `blocked`, and `complete`. - -## Files - -- `bin/lib/next.mjs` -- `bin/gsdd.mjs` -- `bin/lib/init-runtime.mjs` -- `README.md` -- `tests/gsdd.next.test.cjs` - -## Verify - -- `rtk node bin/gsdd.mjs next --json` -- `rtk node bin/gsdd.mjs next` -- `rtk node --test --test-reporter=spec tests/gsdd.next.test.cjs` diff --git a/.work/milestone/phases/03-next-router/03-PLAN.md b/.work/milestone/phases/03-next-router/03-PLAN.md deleted file mode 100644 index d6049f6..0000000 --- a/.work/milestone/phases/03-next-router/03-PLAN.md +++ /dev/null @@ -1,29 +0,0 @@ -# Phase 3 Plan: `gsdd next` Router - -## Goal - -Implement `gsdd next` as a read-only state router that emits concise human output and structured JSON. - -## Requirements - -- NEXT-01 -- FLOW-01 -- EVAL-01 - -## Tasks - -1. Define the packet contract and allowed states. -2. Inspect `.work`, evidence, graph, questions, handoff, dogfood, and legacy `.planning`. -3. Route all allowed states deterministically where possible. -4. Add fixture tests for state routing and JSON shape. - -## Evidence - -- `gsdd next --json` -- `gsdd next` -- Route fixture tests - -## Boundaries - -- v1 recommends; it does not execute workflow commands. -- Do not silently cross human gates. diff --git a/.work/milestone/phases/03-next-router/03-VERIFY.md b/.work/milestone/phases/03-next-router/03-VERIFY.md deleted file mode 100644 index 6f37579..0000000 --- a/.work/milestone/phases/03-next-router/03-VERIFY.md +++ /dev/null @@ -1,19 +0,0 @@ -# Phase 3 Verify Packet - -Status: passed - -## Checks - -- JSON output contains valid state, reason, confidence, next command, constraints, evidence requirements, artifacts, privacy notes, inputs, and trace refs. -- Human output includes state, reason, next action, evidence requirements, skipped inputs, and trace refs. -- Missing legacy lifecycle files do not produce false execution state. -- All allowed states have fixture coverage. - -## Evidence - -- Focused tests passed. -- Real repo `gsdd next` and `gsdd next --json` passed. - -## Remaining Risk - -`next_command` values still mix exact commands and workflow names. This is usable but should be tightened before a polished CLI release. diff --git a/.work/milestone/phases/04-questions-decisions-handoff/04-EXECUTE.md b/.work/milestone/phases/04-questions-decisions-handoff/04-EXECUTE.md deleted file mode 100644 index 4686031..0000000 --- a/.work/milestone/phases/04-questions-decisions-handoff/04-EXECUTE.md +++ /dev/null @@ -1,21 +0,0 @@ -# Phase 4 Execute Packet - -## Implemented - -- Added question add and answer operations. -- Added decision record operation with privacy validation. -- Added routing from unresolved blocking questions to `ask_user`. -- Maintained `.work/handoff/current.md` during the loop. - -## Files - -- `bin/lib/work-context.mjs` -- `bin/lib/next.mjs` -- `tests/gsdd.next.test.cjs` -- `.work/handoff/current.md` - -## Verify - -- `rtk node --test --test-reporter=spec tests/gsdd.next.test.cjs` -- Inspect `.work/questions/open.json` -- Inspect `.work/questions/answered.jsonl` diff --git a/.work/milestone/phases/04-questions-decisions-handoff/04-PLAN.md b/.work/milestone/phases/04-questions-decisions-handoff/04-PLAN.md deleted file mode 100644 index b7e33a2..0000000 --- a/.work/milestone/phases/04-questions-decisions-handoff/04-PLAN.md +++ /dev/null @@ -1,30 +0,0 @@ -# Phase 4 Plan: Questions, Decisions, and Handoff - -## Goal - -Make user decision frontloading and return-later continuity durable. - -## Requirements - -- QUESTION-01 -- INTERRUPT-01 -- DECISION-01 - -## Tasks - -1. Add open question persistence and answer history. -2. Route unresolved blocking questions to `ask_user`. -3. Add decision persistence with privacy validation and supersession fields. -4. Maintain handoff context for future runs. - -## Evidence - -- `gsdd next question add` -- `gsdd next question answer` -- `gsdd next decision record` -- `.work/handoff/current.md` - -## Boundaries - -- Do not ask non-blocking questions just to avoid engineering judgment. -- Do not write partial decision artifacts on invalid input. diff --git a/.work/milestone/phases/04-questions-decisions-handoff/04-VERIFY.md b/.work/milestone/phases/04-questions-decisions-handoff/04-VERIFY.md deleted file mode 100644 index 4929be0..0000000 --- a/.work/milestone/phases/04-questions-decisions-handoff/04-VERIFY.md +++ /dev/null @@ -1,20 +0,0 @@ -# Phase 4 Verify Packet - -Status: passed with hardening candidate - -## Checks - -- Open questions persist. -- Answered questions append to history and graph state. -- Missing `open.json` is surfaced as skipped input. -- Malformed `open.json` blocks routing. -- Invalid decision privacy exits without partial decision file. - -## Evidence - -- Focused tests passed. -- Handoff file exists and captures current implementation posture. - -## Remaining Risk - -Duplicate question, decision, and dogfood IDs can still overwrite prior artifacts. Add explicit `--replace` behavior in a hardening follow-up. diff --git a/.work/milestone/phases/05-dogfood-gap-loop/05-EXECUTE.md b/.work/milestone/phases/05-dogfood-gap-loop/05-EXECUTE.md deleted file mode 100644 index f5e8014..0000000 --- a/.work/milestone/phases/05-dogfood-gap-loop/05-EXECUTE.md +++ /dev/null @@ -1,21 +0,0 @@ -# Phase 5 Execute Packet - -## Implemented - -- Added evidence-manifest routing for gaps, audit, dogfood, and completion gates. -- Added dogfood capture command. -- Added local dogfood finding for this milestone. -- Added completion approval gate after audit and dogfood. - -## Files - -- `bin/lib/next.mjs` -- `bin/lib/work-context.mjs` -- `.work/evidence/manifest.json` -- `.work/dogfood/gsdd-next-continuity-milestone.md` -- `tests/gsdd.next.test.cjs` - -## Verify - -- `rtk node bin/gsdd.mjs next dogfood capture --id --finding --json` -- `rtk node --test --test-reporter=spec tests/gsdd.next.test.cjs` diff --git a/.work/milestone/phases/05-dogfood-gap-loop/05-PLAN.md b/.work/milestone/phases/05-dogfood-gap-loop/05-PLAN.md deleted file mode 100644 index a59696c..0000000 --- a/.work/milestone/phases/05-dogfood-gap-loop/05-PLAN.md +++ /dev/null @@ -1,28 +0,0 @@ -# Phase 5 Plan: Dogfood and Gap-Fix Loop - -## Goal - -Close the verify -> audit -> fix-gaps -> dogfood loop without letting agents overclaim completion. - -## Requirements - -- DOGFOOD-01 -- FLOW-01 - -## Tasks - -1. Route evidence gaps to `fix_gaps`. -2. Route passed verification without passed audit to `audit`. -3. Route passed audit without dogfood to `dogfood`. -4. Capture bounded local dogfood findings. - -## Evidence - -- `.work/evidence/manifest.json` -- `gsdd next dogfood capture` -- `gsdd next --json` - -## Boundaries - -- Dogfood findings stay local by default. -- Milestone completion still requires explicit human approval. diff --git a/.work/milestone/phases/05-dogfood-gap-loop/05-VERIFY.md b/.work/milestone/phases/05-dogfood-gap-loop/05-VERIFY.md deleted file mode 100644 index d4b2ec0..0000000 --- a/.work/milestone/phases/05-dogfood-gap-loop/05-VERIFY.md +++ /dev/null @@ -1,19 +0,0 @@ -# Phase 5 Verify Packet - -Status: passed - -## Checks - -- Evidence gaps route to `fix_gaps`. -- Passed audit and no dogfood routes to `dogfood`. -- Passed audit plus dogfood routes to an explicit completion approval question. -- Dogfood capture writes local artifact and graph event. - -## Evidence - -- Focused tests passed. -- Local dogfood capture exists. - -## Remaining Risk - -Dogfood export to `../ideaspine` is intentionally not implemented. It should be a future explicit command. diff --git a/.work/milestone/phases/06-harness-evals-trust/06-EXECUTE.md b/.work/milestone/phases/06-harness-evals-trust/06-EXECUTE.md deleted file mode 100644 index 6c3f1c0..0000000 --- a/.work/milestone/phases/06-harness-evals-trust/06-EXECUTE.md +++ /dev/null @@ -1,22 +0,0 @@ -# Phase 6 Execute Packet - -## Implemented - -- Added trust-gate precedence over state-derived routes. -- Added human-output evidence/skipped-input/trace sections. -- Added tests for malformed questions, invalid decision privacy, help output, skipped inputs, and trust gate precedence. -- Added full suite verification. - -## Files - -- `bin/lib/next.mjs` -- `bin/lib/work-context.mjs` -- `tests/gsdd.next.test.cjs` -- `.work/handoff/current.md` - -## Verify - -- `rtk node --test --test-reporter=spec tests/gsdd.next.test.cjs` -- `rtk node --test --test-reporter=spec tests/gsdd.guards.test.cjs` -- `rtk npm test` -- `rtk npm pack --dry-run --json` diff --git a/.work/milestone/phases/06-harness-evals-trust/06-PLAN.md b/.work/milestone/phases/06-harness-evals-trust/06-PLAN.md deleted file mode 100644 index 36e0b16..0000000 --- a/.work/milestone/phases/06-harness-evals-trust/06-PLAN.md +++ /dev/null @@ -1,29 +0,0 @@ -# Phase 6 Plan: Harness Evals and Trust Boundaries - -## Goal - -Add the minimum eval and trust-boundary coverage needed for long-term consistency. - -## Requirements - -- TRUST-01 -- EVAL-01 -- TRACE-01 - -## Tasks - -1. Cover all route states with durable tests. -2. Ensure trust gates beat optimistic state routing. -3. Surface skipped inputs and trace refs in human output. -4. Run full package tests and packaging checks. - -## Evidence - -- `tests/gsdd.next.test.cjs` -- `rtk npm test` -- `rtk npm pack --dry-run --json` - -## Boundaries - -- No live vendor probes or browser attachment. -- No publication of local-only runtime state. diff --git a/.work/milestone/phases/06-harness-evals-trust/06-VERIFY.md b/.work/milestone/phases/06-harness-evals-trust/06-VERIFY.md deleted file mode 100644 index 07174f7..0000000 --- a/.work/milestone/phases/06-harness-evals-trust/06-VERIFY.md +++ /dev/null @@ -1,21 +0,0 @@ -# Phase 6 Verify Packet - -Status: passed - -## Checks - -- All allowed states are covered. -- Trust gates take precedence over execution-style state routing. -- Human output is useful for an agent continuing after context loss. -- Focused, guard, full-suite, and package dry-run verification passed before final packet creation. - -## Evidence - -- Focused tests passed with 17 tests. -- Guard tests passed. -- Full `rtk npm test` passed. -- `rtk npm pack --dry-run --json` confirmed new runtime files were included before final `.work/milestone` packet creation. - -## Remaining Risk - -Run one final package dry-run after these durable milestone packets if shipping an npm release from this exact tree. diff --git a/.work/milestone/phases/07-easy-global-install-auto-mode/07-EXECUTE.md b/.work/milestone/phases/07-easy-global-install-auto-mode/07-EXECUTE.md deleted file mode 100644 index 4828292..0000000 --- a/.work/milestone/phases/07-easy-global-install-auto-mode/07-EXECUTE.md +++ /dev/null @@ -1,54 +0,0 @@ -# Phase 7 Execute Packet - -## Implemented - -- Added `install --global --auto` to the existing global install command. -- Auto mode detects existing supported agent homes and installs only those targets. -- Explicit `--tools ` continues to override detection and preserve scoped installs. -- No-detection auto mode fails closed with a clear `--tools` fallback and writes no global files. -- Invalid explicit targets still fail before any manifest writes. -- Help, README, user guide, and runtime support docs now present `install --global --auto` as the easy non-interactive global install path. -- Hardened lifecycle preflight so branch-local `.work/milestone` phases are governed by `.work` plan/execute packets instead of unrelated `.planning` phase numbers or drift. -- Hardened resume preflight so a generic checkpoint that explicitly points at `.work/milestone` continuity is allowed through `work_milestone` authority while unrelated `.planning` drift remains warning-level. - -## Files - -- `bin/lib/global-install.mjs` -- `bin/lib/init-runtime.mjs` -- `bin/lib/lifecycle-preflight.mjs` -- `tests/gsdd.global-install-pressure.test.cjs` -- `tests/gsdd.init.test.cjs` -- `tests/phase.test.cjs` -- `tests/gsdd.guards.test.cjs` -- `README.md` -- `docs/USER-GUIDE.md` -- `docs/RUNTIME-SUPPORT.md` - -## Deviations - -- Initial legacy `.planning` execute/verify preflight blocked because Phase 7 is not in `.planning/ROADMAP.md` and `.planning/SPEC.md` has pre-existing drift. The deterministic preflight helper now recognizes matching `.work/milestone` phases as `work_milestone` authority, so downstream generated helpers no longer force branch-local packets through unrelated `.planning` phase numbers. -- Initial resume preflight still blocked because `resume` has no phase argument, so it could not reach the phase-based `.work/milestone` authority path. The deterministic preflight helper now classifies `.work/milestone` resume checkpoints before applying `.planning` drift as a blocker. -- `tests/gsdd.guards.test.cjs` did not need code changes; the existing guard suite covers the public docs/help contracts after the docs update. - -## Verify - -- `rtk node tests/gsdd.global-install-pressure.test.cjs` -- `rtk node tests/gsdd.init.test.cjs` -- `rtk node tests/phase.test.cjs` -- `rtk node tests/gsdd.guards.test.cjs` -- `rtk node .planning/bin/gsdd.mjs lifecycle-preflight verify 7 --expects-mutation phase-status` -- `rtk node .planning/bin/gsdd.mjs lifecycle-preflight execute 7 --expects-mutation phase-status` -- `rtk node .planning/bin/gsdd.mjs lifecycle-preflight resume` -- `rtk node bin/gsdd.mjs help` -- `rtk npm pack --dry-run --json` -- `rtk npm test` -- `rtk git diff --check` - -## Second Pass - -- Checked that no `--from`, handoff-file parser, stdin parser, URL installer, script execution, or `autoinstall` command was added. -- Checked that `init --auto` still requires `--tools` and writes repo-local `autoAdvance`. -- Checked that `install --global` without `--auto`, `--tools`, or TTY selection still fails in non-interactive shells. -- Checked that global install still never bootstraps repo-local `.planning/` or `.agents/`. -- Checked that `.work/milestone` authority activates only when the requested phase exists in `.work/milestone/ROADMAP.md`; ordinary `.planning` lifecycle gates retain their existing behavior. -- Checked that resume only downgrades `.planning` drift when the checkpoint itself points at `.work/milestone`; ordinary checkpoints still block on planning drift. diff --git a/.work/milestone/phases/07-easy-global-install-auto-mode/07-PLAN.md b/.work/milestone/phases/07-easy-global-install-auto-mode/07-PLAN.md deleted file mode 100644 index 0cb59f8..0000000 --- a/.work/milestone/phases/07-easy-global-install-auto-mode/07-PLAN.md +++ /dev/null @@ -1,311 +0,0 @@ ---- -phase: 07-easy-global-install-auto-mode -plan: 07 -type: execute -wave: 1 -runtime: codex-cli -assurance: self_checked -depends_on: [] -files-modified: - - bin/lib/global-install.mjs - - bin/lib/init-runtime.mjs - - bin/lib/lifecycle-preflight.mjs - - tests/gsdd.global-install-pressure.test.cjs - - tests/gsdd.init.test.cjs - - tests/phase.test.cjs - - tests/gsdd.guards.test.cjs - - README.md - - docs/USER-GUIDE.md - - docs/RUNTIME-SUPPORT.md -autonomous: true -requirements: - - INSTALL-AUTO-01 - - INSTALL-SAFETY-01 - - WORK-LIFECYCLE-01 -non_goals: - - Do not add a second top-level installer when `install --global` can own the path. - - Do not make repo-local `init --auto` semantics mutate global agent homes. - - Do not fetch remote install specs, execute scripts, or infer install targets from arbitrary prose. - - Do not add `--from`, install-handoff files, stdin parsing, or markdown/JSON manifest input in this phase. -hard_boundaries: - - Global writes must continue through the existing manifest-tracked global install writer. - - `install --global --auto` may be one-command easy, but it must still protect unmanaged and user-modified files. - - Auto mode installs only detected local agent targets unless `--tools` explicitly scopes the target set. - - Branch-local `.work/milestone` phase packets must not be forced through unrelated `.planning` phase ownership. -escalation_triggers: - - Stop if implementation needs `--force`, unmanaged overwrite behavior, remote URLs, or shell execution. - - Stop if adding `--auto` to global install would contradict the existing repo-local `init --auto` contract. -approval_gates: - - Ask before changing `init --auto` behavior or the meaning of `autoAdvance`. - - Ask before adding an `autoinstall` alias, `--from`, remote manifests, or arbitrary install instruction parsing. -anti_regression_targets: - - `npx -y gsdd-cli init --auto --tools ` still writes repo-local `.planning` state and `autoAdvance: true`. - - `install --global` without `--auto`, `--tools`, or TTY selection still fails in non-interactive shells. - - Global install still never creates repo-local `.planning/` or `.agents/`. -ui_proof_slots: [] -no_ui_proof_rationale: CLI/docs/test-only work; no rendered UI outcome is claimed. -high_leverage_surfaces: - - bin/lib/global-install.mjs - - bin/lib/init-runtime.mjs - - bin/lib/lifecycle-preflight.mjs - - README.md -second_pass_required: true -closure_claim_limit: Claim only easier detected global installation, not handoff-file installation, autonomous remote installation, or live runtime parity. -parallelism_budget: - max_concurrent_plans: 1 - safe_parallelism: [] -leverage: - lost: Adds one more meaning for `--auto` that must be documented carefully across init and install. - kept: Existing manifest-safe global installer, repo-local `init --auto`, shared skill-root architecture, and global/local install separation. - gained: One-command detected global install without creating a competing installer or manifest parser. -must_haves: - truths: - - A user can run `npx -y gsdd-cli install --global --auto` and install detected global agent targets without prompts. - - A user can still pass `--tools ` to explicitly scope the global install target set. - - Existing `init --auto` and non-interactive global install safety behavior remain intact. - - Matching `.work/milestone` phases use work-native lifecycle authority instead of unrelated `.planning` phase numbers. - artifacts: - - path: bin/lib/global-install.mjs - provides: detected `install --global --auto` handling - - path: bin/lib/init-runtime.mjs - provides: help text that distinguishes repo-local auto init from global auto install - - path: tests/gsdd.global-install-pressure.test.cjs - provides: auto global install and detection/scoping regression coverage - - path: tests/phase.test.cjs - provides: work-native lifecycle preflight regression coverage for branch-local phase packets - key_links: - - from: bin/lib/global-install.mjs - to: bin/lib/global-manifest.mjs - via: existing manifest-tracked write path - - from: README.md - to: bin/lib/init-runtime.mjs - via: matching public command examples ---- - -# Phase 7 Plan: Easy Global Install Auto Mode - -## Objective - -Make global install genuinely easy by extending the existing install command instead of inventing a separate installer. The primary user-facing path becomes: - -```text -npx -y gsdd-cli install --global --auto -``` - -This should perform the safe default global install through the current manifest-tracked writer. In this phase, "safe default" means detected local agent targets only. A future handoff-file design can be considered separately after the basic install path is proven. - -## Context - -- `init --auto --tools ` already exists and is repo-local; it writes `.planning/config.json` with `autoAdvance: true`. -- `install --global` already installs global agent-home surfaces, but non-interactive use currently requires `--tools `. -- Existing tests explicitly protect both sides: repo-local `init --auto` and global install no-repo-bootstrap behavior. -- The previous Phase 7 draft proposed `gsdd autoinstall`; this revision keeps CLI gravity on the existing `install --global` command because that better matches the current architecture and the user's `--auto` point. - -## Requirements Covered - -- `INSTALL-AUTO-01`: User can run one command, `install --global --auto`, and install detected global Workspine surfaces without choosing targets interactively. -- `INSTALL-SAFETY-01`: Auto global install preserves manifest ownership checks, refuses unsafe target selections, and never bootstraps repo-local state. -- `WORK-LIFECYCLE-01`: Branch-local `.work/milestone` phase packets can execute and verify without being blocked by unrelated `.planning` phase numbers or planning drift. - -## Must-Haves - -1. `npx -y gsdd-cli install --global --auto` installs detected local agent targets without prompting. -2. `npx -y gsdd-cli install --global --auto --tools codex` still scopes installation to Codex only. -3. If no supported agent target is detected, the command exits clearly without writing global files and tells the user how to use `--tools`. -4. Invalid targets fail with exit code 1 before writes. -5. `init --auto` remains repo-local and unchanged except for docs clarifying the difference. -6. Matching `.work/milestone` phases return `authority: "work_milestone"` from lifecycle preflight; repeat execute still fails closed after the execute packet exists. - -## Anti-Goals - -- No `autoinstall` top-level command in this phase. -- No `--from` handoff file, stdin input, markdown instruction parser, or JSON manifest parser in this phase. -- No write-by-default behavior from arbitrary LLM prose. -- No URL installer, curl pipe, package-manager script runner, or remote manifest fetch. -- No `--force` escape hatch for unmanaged global files. -- No repo-local `.planning` bootstrap from global install. - -## Hard Boundaries - -- Use the existing `install --global` implementation and manifest writer. -- Treat `--auto` on global install as non-interactive default selection, not lifecycle `autoAdvance`. -- Keep interactive `install --global` behavior available for users who want to choose targets. -- Detection is advisory target selection only; manifest ownership remains the authority for whether a write is allowed. - -## Evidence Contract - -- Tests prove `install --global --auto` succeeds in a non-interactive fixture and writes the expected global surfaces. -- Tests prove `install --global --auto --tools codex` remains scoped. -- Tests prove no-detection behavior fails closed with a useful message and no global writes. -- Existing `init --auto` tests still pass, including `autoAdvance` config assertions. -- Docs and help show the one-command path and distinguish it from repo-local init. -- Phase preflight tests prove `.work/milestone` execute/verify semantics and generated-helper propagation. - -## Common Pitfalls - -- Conflating `autoAdvance` with global installer auto mode. -- Installing every global surface when only a subset of tools is detected. -- Treating lack of detection as permission to install everything. -- Weakening the non-interactive safety test for plain `install --global`. -- Creating a parallel `autoinstall` implementation that drifts from manifest safety. -- Smuggling the deferred handoff-file idea back into this phase. -- Letting `.work` dogfood packets collide with stale or unrelated `.planning` lifecycle state. - -## Stop-And-Challenge - -- Stop if the implementation would require changing `init --auto` behavior. -- Stop if detection is unreliable enough that `--auto` cannot safely choose targets without user input. -- Stop if implementation starts adding `--from`, remote fetch, script execution, or freeform prose interpretation. -- Stop if tests show any repo-local `.planning/` or `.agents/` state created by global install. -- Stop if `.work/milestone` authority masks ordinary `.planning` phase gates when no matching `.work` phase exists. - -## Approval Gates - -- Human approval is required before adding `--from`, URL support, remote manifests, script execution, or force overwrite. -- Human approval is required before renaming or repurposing `autoAdvance`. - - - -checker: self -checker_runtime: codex-cli -status: passed -blocking: false -notes: Strict legacy `.planning` plan preflight remains blocked by stale planning-state drift and missing active roadmap parsing; this plan is intentionally written to the branch-local `.work` milestone surface. The revision explicitly answers the user's `--auto` concern by preferring detected `install --global --auto` over a new `autoinstall` command or `--from` handoff-file parser. - - - -## Tasks - - - - - MODIFY: bin/lib/global-install.mjs - - MODIFY: bin/lib/init-runtime.mjs - - - Extend global install argument parsing to support `--auto`. - In auto mode, detect locally available supported agent targets when no - `--tools` target is supplied; when `--tools` supplies targets, honor that - explicit narrower set. If no supported target is detected, exit clearly with - no writes and show the explicit `--tools` fallback. Update help text to show the - simple one-command path and clarify that this is distinct from repo-local - `init --auto`. - - - - Run `node tests/gsdd.global-install-pressure.test.cjs` - - Run `node tests/gsdd.init.test.cjs` - - Run `node bin/gsdd.mjs help` - - - `install --global --auto` works non-interactively through the existing global - install writer, detected targets are scoped safely, and help text is clear. - - - - - - - MODIFY: tests/gsdd.global-install-pressure.test.cjs - - MODIFY: tests/gsdd.init.test.cjs - - MODIFY: tests/gsdd.guards.test.cjs - - - Add tests for the one-command auto global install, scoped `--auto --tools`, - no-detection behavior, invalid targets, and no repo bootstrap. Keep existing - `init --auto` tests intact and add a regression asserting global `--auto` - does not write `autoAdvance` or `.planning` state. - - - - Run `node tests/gsdd.global-install-pressure.test.cjs` - - Run `node tests/gsdd.init.test.cjs` - - Run `node tests/gsdd.guards.test.cjs` - - - The test suite locks the separation between repo-local auto init and global - auto install while proving the easy path works. - - - - - - - MODIFY: README.md - - MODIFY: docs/USER-GUIDE.md - - MODIFY: docs/RUNTIME-SUPPORT.md - - - Document the easy global install path as `npx -y gsdd-cli install --global - --auto`. Show scoped variants with `--tools` and explain that auto mode uses - detected local agent targets. Keep `init --auto` documented as repo-local setup - with `.planning`/`autoAdvance`, and keep `install --global` documented as the - user-home surface installer. - - - - Run `node tests/gsdd.guards.test.cjs` - - Run `npm pack --dry-run --json` - - - Public docs make the easiest install path obvious without blurring local and - global install responsibilities. - - - - - - - MODIFY: bin/lib/lifecycle-preflight.mjs - - MODIFY: tests/phase.test.cjs - - - Harden lifecycle preflight after dogfood exposed a downstream clash: when the - requested phase exists in `.work/milestone/ROADMAP.md`, evaluate execute and - verify against `.work/milestone/phases/*/{NN}-PLAN.md` and `{NN}-EXECUTE.md` - rather than unrelated `.planning` phase artifacts. Keep `.planning` drift as - a warning for work-native phases and preserve existing `.planning` behavior - when no matching `.work` phase exists. - - - - Run `node tests/phase.test.cjs` - - Run `node .planning/bin/gsdd.mjs lifecycle-preflight verify 7 --expects-mutation phase-status` - - Run `node .planning/bin/gsdd.mjs lifecycle-preflight execute 7 --expects-mutation phase-status` - - - Source and generated helpers return `authority: "work_milestone"` for matching - work phases, allow verify after execute, and fail repeat execute as - `no_pending_plan`. - - - -## Verification - -- Run `node tests/gsdd.global-install-pressure.test.cjs` -- Run `node tests/gsdd.init.test.cjs` -- Run `node tests/phase.test.cjs` -- Run `node tests/gsdd.guards.test.cjs` -- Run `npm pack --dry-run --json` - -## Success Criteria - -- `install --global --auto` is the shortest global install path and works in non-interactive shells. -- Detected agent targets define the default install set; `--tools` lets users explicitly narrow or override it. -- Invalid targets and no-detection cases fail closed before global writes. -- `init --auto` remains repo-local and unchanged. -- Branch-local `.work/milestone` lifecycle packets do not collide with unrelated `.planning` phase state. - -## High-Leverage Review - -High-leverage surfaces touched by the future implementation: global installer argument parsing, lifecycle preflight, CLI help, public README, and tests that protect install boundaries. A second pass is required before execution is considered complete, specifically checking that `--auto` does not conflate global install with repo-local `autoAdvance` and that `.work/milestone` authority does not weaken ordinary `.planning` gates. - -## Leverage Review - -- Lost: `--auto` now has two contexts that must be explained precisely. -- Kept: one installer implementation, manifest-tracked global writes, repo-local init semantics, and existing install pressure coverage. -- Gained: a one-command detected global install path without adding a second CLI noun or premature handoff-file parser, plus a deterministic work-native lifecycle seam for branch-local dogfood packets. - -## Research Notes - -- Existing repo facts resolve the core architecture: `init --auto` is repo-local and `install --global` is the user-home installer. -- No new library is required. Use Node built-ins and existing CLI helpers. -- Current agent-skill research remains relevant: global install should write reusable skill/native surfaces into user-home agent roots rather than requiring prompt paste. - -## Notes - -- Reduced assurance: `.planning` lifecycle preflight is blocked in this checkout; this branch uses `.work` for continuity. -- User direction for this revision: preserve and reuse the existing `--auto` vocabulary to make install easy. -- User direction for this revision: remove `--from`; handoff-file installation should be a future separate design, not Phase 7. diff --git a/.work/milestone/phases/07-easy-global-install-auto-mode/07-VERIFY.md b/.work/milestone/phases/07-easy-global-install-auto-mode/07-VERIFY.md deleted file mode 100644 index b72ebe5..0000000 --- a/.work/milestone/phases/07-easy-global-install-auto-mode/07-VERIFY.md +++ /dev/null @@ -1,80 +0,0 @@ ---- -phase: 07-easy-global-install-auto-mode -runtime: codex-cli -assurance: self_checked -verified: 2026-06-26T09:26:24+02:00 -status: passed -score: 4/4 must-haves verified -delivery_posture: delivery_sensitive -evidence_contract: - required_kinds: [code, runtime, delivery] - recommended_kinds: [test] - observed_kinds: [code, test, runtime, delivery] - missing_kinds: [] -gaps: [] -git_delivery_check: - branch: feat/dogfood-installability-spine - commits_ahead_of_main: 14 - pr_state: merged ---- - -# Phase 7 Verify Packet - -Status: passed - -## Verification Basis - -- Plan: `.work/milestone/phases/07-easy-global-install-auto-mode/07-PLAN.md` -- Execute packet: `.work/milestone/phases/07-easy-global-install-auto-mode/07-EXECUTE.md` -- Code artifacts: `bin/lib/global-install.mjs`, `bin/lib/init-runtime.mjs`, `bin/lib/lifecycle-preflight.mjs` -- Tests/docs artifacts: `tests/gsdd.init.test.cjs`, `tests/gsdd.global-install-pressure.test.cjs`, `tests/phase.test.cjs`, `tests/gsdd.guards.test.cjs`, `README.md`, `docs/USER-GUIDE.md`, `docs/RUNTIME-SUPPORT.md` - -Initial legacy `.planning` verify preflight blocked for this branch-local follow-up because Phase 7 is represented in `.work/milestone`, not `.planning/ROADMAP.md`. That downstream clash is now hardened: source and generated lifecycle helpers return `authority: "work_milestone"` for matching `.work/milestone` phases, keep `.planning` drift as a warning, and still fail repeat execute as `no_pending_plan`. - -Resume preflight had the same authority-selection issue in a different shape: generic checkpoints do not carry a phase argument, so they could still block on unrelated `.planning` drift before the checkpoint could be loaded. Source and generated lifecycle helpers now detect checkpoints that explicitly point at `.work/milestone` continuity and allow resume with `authority: "work_milestone"` while preserving drift blocking for ordinary checkpoints. - -## Must-Haves - -- `install --global --auto` installs detected local agent targets without prompting: verified by init and pressure-loop tests. -- `install --global --auto --tools codex` remains scoped: verified by init tests. -- No supported target detected fails closed before writes and points to `--tools`: verified by init tests. -- Invalid targets fail before writes: verified by init tests. -- `init --auto` remains repo-local and unchanged: verified by existing init auto-mode tests plus the global-auto no-bootstrap regressions. -- Branch-local `.work/milestone` phases and checkpoints do not collide with unrelated `.planning` lifecycle state: verified by phase preflight tests and local generated-helper execution. - -## Artifact Verification - -| Artifact | Exists | Substantive | Wired | Notes | -| --- | --- | --- | --- | --- | -| `bin/lib/global-install.mjs` | yes | yes | yes | Adds detected auto target selection before interactive fallback, keeps manifest writer unchanged. | -| `bin/lib/init-runtime.mjs` | yes | yes | yes | Help text distinguishes repo-local `init --auto` from global install `--auto`. | -| `bin/lib/lifecycle-preflight.mjs` | yes | yes | yes | Adds scoped `.work/milestone` authority for matching work phases and `.work` resume checkpoints while preserving `.planning` behavior by default. | -| `tests/gsdd.init.test.cjs` | yes | yes | yes | Covers detection, explicit scope, no detection, invalid targets, and no repo bootstrap. | -| `tests/gsdd.global-install-pressure.test.cjs` | yes | yes | yes | Covers cross-repo global auto install behavior. | -| `tests/phase.test.cjs` | yes | yes | yes | Covers `.work/milestone` execute/verify/resume preflight, drift handling, ordinary checkpoint drift blocking, repeat execute block, and generated helper behavior. | -| Public docs | yes | yes | yes | README, User Guide, and Runtime Support show `install --global --auto` and `--tools` override. | - -## Evidence - -- `rtk node tests/gsdd.global-install-pressure.test.cjs`: passed, 12 tests. -- `rtk node tests/gsdd.init.test.cjs`: passed, 44 tests. -- `rtk node tests/phase.test.cjs`: passed, 163 tests. -- `rtk node .planning/bin/gsdd.mjs lifecycle-preflight verify 7 --expects-mutation phase-status`: passed with `authority: "work_milestone"` and planning drift as warning. -- `rtk node .planning/bin/gsdd.mjs lifecycle-preflight execute 7 --expects-mutation phase-status`: blocked as expected with `reason: "no_pending_plan"` because `07-EXECUTE.md` exists. -- `rtk node .planning/bin/gsdd.mjs lifecycle-preflight resume`: passed with `authority: "work_milestone"` and planning drift as warning because the checkpoint points at `.work/milestone`. -- `rtk node tests/gsdd.guards.test.cjs`: passed, 387 tests. -- `rtk node bin/gsdd.mjs help`: passed and showed `install --global [--auto] [--tools ] [--dry]`. -- `rtk npm pack --dry-run --json`: passed and included changed runtime/doc files in the package listing. -- `rtk npm test`: passed. -- `rtk git diff --check`: passed. - -## Anti-Patterns - -- No `--from` implementation, stdin parser, markdown/JSON manifest parser, URL fetch, script execution, or `autoinstall` command was introduced. -- Existing `console.log` hits are intentional CLI/test output surfaces. -- The only TODO/FIXME/HACK marker hit in scoped scan is documentation describing the health check output categories, not implementation debt. - -## Remaining Risk - -- Detection is intentionally conservative: it depends on existing target home directories. If a user has an installed agent that has not created its config home yet, `--auto` fails closed and tells them to use `--tools`. -- Git delivery metadata points at an already merged PR for the branch head; current Phase 7 changes are local uncommitted work and would need a new commit/PR if shipping. diff --git a/.work/milestone/scratchpad/2026-06-21-pr-ship-loop.md b/.work/milestone/scratchpad/2026-06-21-pr-ship-loop.md deleted file mode 100644 index 17c80ea..0000000 --- a/.work/milestone/scratchpad/2026-06-21-pr-ship-loop.md +++ /dev/null @@ -1,169 +0,0 @@ -# Scratchpad: PR Ship Loop for `gsdd next` - -Date: 2026-06-21 -Status: completed locally; PR/merge pending -Purpose: lock the final PR plan, run three challenge loops, and only merge if verification and PR status are clean. - -## Locked User Decisions - -- Shipping scope: current `gsdd next` v1 plus hardening candidates only. -- Challenge loops may change product behavior when the change lowers cognitive load or closes coherence gaps. -- Human surface: terse next action, reason, required evidence, and blocking question. -- Agent surface: exact command or skill names where possible. -- `gsdd next` mutation: plain `gsdd next` remains read-oriented; explicit subcommands and explicit refresh/index operations may mutate. -- Deterministic computation: prefer scripts/helpers for repeatable state, graph, and eval decisions. -- Duplicate IDs: fail unless `--replace` is explicit. -- Graph semantics: implement explicit `answers` and `supersedes` edges now. -- Commit policy: track durable `.work` goal, research, milestone, phase packets, and `.work/.gitignore`. -- PR policy: create PR, inspect status, and squash merge only if clean. - -Assumption: the user omitted `3.2`; I applied the recommended deterministic-helper posture because it matches the surrounding answer and Workspine philosophy. - -## Research Grounding Added During This Loop - -- OpenAI Codex non-interactive mode and CLI features reinforce scriptable, non-TUI automation and stable stdout contracts: - - https://developers.openai.com/codex/noninteractive - - https://developers.openai.com/codex/cli/features - - https://developers.openai.com/codex/cli/reference -- Claude Code hooks document lifecycle event schemas and JSON I/O, supporting deterministic harness boundaries: - - https://code.claude.com/docs/en/hooks -- OpenAI and Anthropic harness/eval guidance from the gpt-5.4 research agent emphasizes evidence-gated loops, context resets, structured handoffs, and trace-to-eval feedback: - - https://developers.openai.com/cookbook/examples/agents_sdk/agent_improvement_loop - - https://developers.openai.com/blog/run-long-horizon-tasks-with-codex - - https://www.anthropic.com/engineering/effective-harnesses-for-long-running-agents - - https://www.anthropic.com/engineering/harness-design-long-running-apps - - https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents -- Durable human interrupts and task/artifact-centric orchestration are supported by current agent frameworks: - - https://docs.langchain.com/oss/python/langgraph/interrupts - - https://docs.langchain.com/oss/python/langchain/frontend/human-in-the-loop - - https://openai.github.io/openai-agents-python/human_in_the_loop/ - - https://developers.googleblog.com/en/a2a-a-new-era-of-agent-interoperability/ -- Current MCP/security guidance reinforces fail-closed trust boundaries and skeptical treatment of tool/resource descriptions: - - https://labs.cloudsecurityalliance.org/agentic/agentic-mcp-security-best-practices-v1/ - - https://developer.microsoft.com/blog/protecting-against-indirect-injection-attacks-mcp - - https://www.nsa.gov/Portals/75/documents/Cybersecurity/CSI_MCP_SECURITY.pdf - - https://arxiv.org/html/2603.22489v1 - -Connector note: the Consensus connector required reauthentication, so paper search via that connector was unavailable in this run. - -## Three-Loop Plan - -1. Coherence and architecture loop: - - Challenge whether changes fit Workspine's `.work`/`.planning` boundary. - - Remove prose-ish routing where an exact command/skill token exists. - - Ensure duplicate mutation behavior is retry-safe. - - Verify with focused `gsdd.next` tests. - -2. Deterministic harness and eval loop: - - Challenge whether graph/index behavior lets a future agent reconstruct decisions without transcript rereads. - - Make answer and supersession relationships explicit graph edges. - - Add tests that rebuild/inspect the deterministic index. - - Check human output stays terse and useful. - -3. Privacy, release, and PR-readiness loop: - - Challenge tracked vs local `.work` surfaces. - - Run full tests and package dry-run. - - Inspect git status so unrelated files are excluded. - - Create PR, inspect checks, and squash merge only if clean. - -## Loop 1: Coherence, Architecture, and UX - -Status: completed. - -Findings: - -- `next_command` still included prose values such as `draft .work milestone plan`, `capture .work dogfood finding`, and `review .work/milestone/AUDIT.md with the user`. -- Mutating subcommands overwrote duplicate IDs by default, which is hostile to agent retries and long-term decision archaeology. -- Help did not advertise explicit replacement semantics. - -Changes: - -- Added typed `next_action` values so the agent surface is not overloaded prose: - - `cli_command` - - `workflow_skill` - - `manual_review` - - `user_question` -- Kept `next_command` as a compatibility field, but made `next_action` the stricter contract. -- Changed routable command compatibility strings to exact skill/command tokens where possible: - - `gsdd-plan` - - `gsdd-execute` - - `gsdd-verify` - - `gsdd-audit-milestone` - - `gsdd-plan-milestone-gaps` - - `gsdd-complete-milestone` - - `gsdd next dogfood capture --id --title --body ` -- Added `--replace` to question, decision, and dogfood mutation surfaces. -- Changed question, decision, and dogfood duplicate IDs to replay as `unchanged` when content matches and fail unless `--replace` is explicit when content differs. -- Changed captured stdout default to JSON through `--format auto`; `--format human` gives the compact supervisor card. -- Moved dirty-worktree warnings into `repo_warnings` instead of `privacy_notes`. - -Verification: - -- `rtk node --test --test-reporter=spec tests\gsdd.next.test.cjs`: passed, 26 tests. - -## Loop 2: Deterministic Graph and Harness Eval - -Status: completed. - -Challenge: - -- If a future agent cannot reconstruct why a question was answered or why a decision superseded another decision from `.work/graph/index.json`, Workspine is still relying on transcript archaeology. -- The deterministic graph needs relationships, not only latest node snapshots. - -Changes: - -- `gsdd next question answer` now records both: - - a `question_answered` event with the answer payload - - an `edge_created` event of type `answers` -- `gsdd next decision record --supersedes ` now records both: - - a `decision_recorded` event - - an `edge_created` event of type `supersedes` -- JSON responses now include `graph_event_ids` when a mutation creates more than one event. -- Same-input replays for `question add`, `question answer`, `decision record`, and `dogfood capture` do not append new graph events. -- `.work` writes now use atomic file replacement for JSON/markdown projections and fsynced appends for graph/audit JSONL paths. -- README and CLI help now document `--format human`, JSON-by-default captured output, typed `next_action`, and retry-safe mutation boundaries. - -Verification: - -- `rtk node --test --test-reporter=spec tests\gsdd.next.test.cjs`: passed, 26 tests. - - Includes assertions that `answers` and `supersedes` edges appear in `.work/graph/index.json`. - - Includes assertions that same-input replays produce no new graph event lines. - -## Loop 3: Privacy, Release, and PR Readiness - -Status: completed locally. - -Challenge: - -- The PR should not ship if it mixes repo dirtiness with privacy semantics, relies on raw transcript truth, or stages local-only mutable `.work` files. -- The package should contain the new source modules, while `.work` runtime state remains local. -- External gpt-5.4 research agents identified two PR-critical gaps: overloaded `next_command` and non-idempotent mutation replay. Both were fixed before final verification. - -Changes: - -- Human `gsdd next --format human` now behaves like a compact supervisor card: - - state - - reason - - next review/action - - approval requirement - - blocking question - - evidence required - - repo risk -- Captured `gsdd next` defaults to JSON for agent tooling. -- Completion approval now surfaces a `manual_review` action over audit, roadmap, and evidence before completion. -- `repo_warnings` separate worktree risk from privacy notes. -- Durable `.work/milestone/scratchpad/2026-06-21-pr-ship-loop.md` records the fork decisions, research grounding, and all three loops. - -Verification: - -- `rtk node bin\gsdd.mjs next`: passed; captured stdout emitted JSON with typed `next_action`. -- `rtk node bin\gsdd.mjs next --format human`: passed; emitted the compact supervisor card. -- `rtk node --test --test-reporter=spec tests\gsdd.next.test.cjs`: passed, 26 tests. -- `rtk npm test`: passed. -- `rtk npm pack --dry-run --json`: passed; package includes updated `bin/lib/next.mjs` and `bin/lib/work-context.mjs`. - -Residual follow-up candidates not blocking this PR: - -- Add a full WAL-style operation protocol with `op_id`, `prev_event_id`, `content_sha256`, and `op_committed`. -- Add deterministic golden packet fixtures with normalized volatile fields. -- Add projection consistency checks that detect orphaned side artifacts lacking committed graph events. diff --git a/.work/research/2026-06-20-long-term-agent-harness-consistency.md b/.work/research/2026-06-20-long-term-agent-harness-consistency.md deleted file mode 100644 index 7e02ade..0000000 --- a/.work/research/2026-06-20-long-term-agent-harness-consistency.md +++ /dev/null @@ -1,230 +0,0 @@ -# Research Grounding: Long-Term Agent Harness Consistency - -Date: 2026-06-20 -Scope: `gsdd next`, `.work`, continuity graph, decision gates, verification/audit/gap-fix loop - -## Research Question - -What does Workspine need to do to keep long-running agentic product work coherent over time? - -## Sources Reviewed - -### Current Tooling and Harness References - -- OpenAI, "Build an Agent Improvement Loop with Traces, Evals, and Codex" - https://developers.openai.com/cookbook/examples/agents_sdk/agent_improvement_loop - -- OpenAI, "Evaluate agent workflows" - https://developers.openai.com/api/docs/guides/agent-evals - -- OpenAI, "Guardrails and human review" - https://developers.openai.com/api/docs/guides/agents/guardrails-approvals - -- OpenAI, "Build iterative repair loops with Codex" - https://developers.openai.com/cookbook/examples/codex/build_iterative_repair_loops_with_codex - -- Anthropic, "Building effective agents" - https://www.anthropic.com/engineering/building-effective-agents - -- LangGraph, "Interrupts" - https://docs.langchain.com/oss/python/langgraph/interrupts - -- Temporal, "Workflow Definition" - https://docs.temporal.io/workflow-definition - -- Inspect AI, UK AI Security Institute evaluation framework - https://inspect.aisi.org.uk/ - -- LangSmith, "Evaluation concepts" - https://docs.langchain.com/langsmith/evaluation-concepts - -- OpenTelemetry, "Generative AI semantic conventions" - https://opentelemetry.io/docs/specs/semconv/gen-ai/ - -### Agent Benchmark and Evaluation Papers - -- SWE-bench: Can Language Models Resolve Real-World GitHub Issues? - https://arxiv.org/abs/2310.06770 - -- SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering - https://arxiv.org/abs/2405.15793 - -- AgentBench: Evaluating LLMs as Agents - https://arxiv.org/abs/2308.03688 - -- WebArena: A Realistic Web Environment for Building Autonomous Agents - https://arxiv.org/abs/2307.13854 - -- OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments - https://arxiv.org/abs/2404.07972 - -- tau-bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains - https://arxiv.org/abs/2406.12045 - -### Memory, Reflection, and Self-Improvement Papers - -- ReAct: Synergizing Reasoning and Acting in Language Models - https://arxiv.org/abs/2210.03629 - -- Reflexion: Language Agents with Verbal Reinforcement Learning - https://arxiv.org/abs/2303.11366 - -- Generative Agents: Interactive Simulacra of Human Behavior - https://arxiv.org/abs/2304.03442 - -- Voyager: An Open-Ended Embodied Agent with Large Language Models - https://arxiv.org/abs/2305.16291 - -- ReST meets ReAct: Self-Improvement for Multi-Step Reasoning LLM Agent - https://arxiv.org/abs/2312.10003 - -### 2026 Agent Security and Trust Papers - -- Agent-Fence: Mapping Security Vulnerabilities Across Deep Research Agents - https://arxiv.org/abs/2602.07652 - -- AgentWard: A Lifecycle Security Architecture for Autonomous AI Agents - https://arxiv.org/abs/2604.24657 - -- AgentTrust: Runtime Safety Evaluation and Interception for AI Agent Tool Use - https://arxiv.org/abs/2605.04785 - -- AgentTrust: A Self-Improving Trust Layer for AI-Agent Actions - https://arxiv.org/abs/2606.08539 - -## Synthesis - -### 1. The harness is the product surface - -OpenAI frames an agent harness as the full contract around the model: instructions, tools, routing, output requirements, validation, feedback, evals, and implementation handoff. Workspine should adopt this framing. `gsdd next` is not a convenience command; it is the harness router that keeps the contract coherent. - -Implication for Workspine: - -- `.work` must store harness state, not just prose context. -- `gsdd next` must reason from durable artifacts, not chat memory. -- Every continuation should produce a next-action packet that can be inspected and replayed. - -### 2. Durable interrupts are the right human-gate model - -LangGraph's interrupt pattern pauses execution, persists graph state, surfaces a JSON-serializable question, and resumes with the answer. Temporal's workflow model reinforces durable, deterministic state as the backbone of long-running work. - -Implication for Workspine: - -- Questions must be persisted as first-class graph nodes. -- The agent should stop at decision gates with a durable question packet. -- Resuming should consume an answer and update graph state, not rely on chat scrollback. - -### 3. Review and repair must be separate phases - -OpenAI's Codex repair-loop example separates structured review findings from repair. This is the right shape for Workspine audit/gap-fix. The reviewer/auditor should not silently fix while judging, and the repair pass should consume machine-readable findings. - -Implication for Workspine: - -- `verify` and `audit` should emit structured findings. -- `fix_gaps` should consume those findings. -- `gsdd next` should route between them explicitly. - -### 4. Evaluation needs datasets, solvers, scorers, traces, and human feedback - -Inspect and LangSmith both separate inputs/datasets, agent/solver execution, scorers/evaluators, traces, and feedback. LangSmith distinguishes offline evaluation from online monitoring and emphasizes converting production traces and human feedback into datasets. - -Implication for Workspine: - -- Add fixture-state evals for `gsdd next`. -- Treat `.work/graph/events.jsonl` as raw trace material. -- Treat dogfood findings as human feedback that can become regression tests. -- Do not treat one successful manual run as enough evidence. - -### 5. Agent-computer interface quality changes agent performance - -SWE-agent and Anthropic's agent engineering guidance converge on the same point: agents need a thoughtfully designed interface. Tool names, tool docs, path handling, completion signals, and environmental feedback matter. - -Implication for Workspine: - -- `gsdd next --json` must be stable and easy for agents to consume. -- Tool/CLI outputs should avoid ambiguity. -- Commands should include enough machine-readable structure for follow-on agents. -- Absolute or repo-root-relative paths should be preferred in packets. - -### 6. Long-horizon failure is expected - -AgentBench, WebArena, OSWorld, tau-bench, and SWE-bench all show that realistic interactive tasks expose failures in long-term reasoning, planning, instruction following, environment interaction, and task completion. Workspine should assume agents drift unless the harness actively constrains and measures the work. - -Implication for Workspine: - -- `gsdd next` should never imply unbounded autonomy. -- State transitions must be explicit. -- Completion must be evidence-backed. -- Long-running work needs periodic consolidation and stop conditions. - -### 7. Memory must be distilled, scoped, and falsifiable - -ReAct, Reflexion, Generative Agents, Voyager, and ReST-meets-ReAct all support the idea that agents improve when they can reflect, remember useful lessons, and reuse skills. They do not justify dumping raw transcripts into state. Workspine needs distilled, typed memory with provenance and supersession. - -Implication for Workspine: - -- Store decisions, questions, evidence, dogfood findings, and session summaries as typed graph nodes. -- Do not ingest raw transcripts by default. -- Every memory entry needs source, time, privacy, and supersession semantics. - -### 8. Runtime safety belongs at the tool/action boundary - -Agent-Fence, AgentWard, and AgentTrust shift agent safety away from prompt-only safety and toward lifecycle security, trust boundaries, action interception, and trace-auditable breaks. MCP tool annotations also cannot be blindly trusted unless the server itself is trusted. - -Implication for Workspine: - -- Human gates must be tied to action risk and reversibility. -- `gsdd next` should identify when the next step would cross a privileged boundary. -- Publication/export checks should fail closed for local-only graph or evidence. -- Future browser or MCP provider work must treat localhost/control-plane trust as a security boundary. - -### 9. Observability is not optional - -OpenTelemetry GenAI conventions, LangSmith traces, Inspect evaluations, and OpenAI trace/eval guidance all point to the same requirement: long-term improvement needs observable runs, not summaries alone. - -Implication for Workspine: - -- `.work/graph/events.jsonl` should be trace-like enough to reconstruct decisions. -- `gsdd next` outputs should include confidence, reason, inputs considered, and skipped inputs. -- Verification and audit should cite the graph/evidence events they used. - -### 10. Simplicity should be defended deliberately - -Anthropic's guidance warns against complex frameworks when simpler composable patterns suffice. This supports the current Workspine direction: JSONL event log first, derived index second, no database or hosted memory until the file model is insufficient. - -Implication for Workspine: - -- Keep v1 file-based. -- Avoid SQLite, vector DBs, MCP memory servers, and hosted memory in the first milestone. -- Build testable seams so the storage backend can evolve later. - -## Concrete Changes This Research Implies - -Add or preserve these milestone constraints: - -- `gsdd next` v1 is read-only routing plus optional local state validation, not an executor. -- `.work/graph/events.jsonl` is append-only and source-of-truth. -- `.work/graph/index.json` is rebuildable. -- Questions are durable interrupts. -- Review/audit and repair/fix-gaps are separate states. -- Evals include fixture states for every `gsdd next` output state. -- Dogfood findings are human feedback inputs, not random notes. -- Raw transcripts remain opt-in and local-only. -- Human gates are driven by stakes, reversibility, privacy, and authority boundaries. -- Every next-action packet should be machine-readable and human-readable. - -## Prompt Delta For Future Agents - -Add this to kickoff prompts: - -```text -Harness-engineering requirements: -- Treat Workspine as an agent harness, not just a CLI. -- The harness contract includes instructions, tools, routing, state, graph memory, validation, evidence, evals, repair loops, human gates, and dogfood feedback. -- Design `gsdd next` so every continuation is replayable from durable state, not chat memory. -- Human gates must behave like durable interrupts: persist the question, pause, and resume from the answer. -- Separate review from repair: review emits structured findings; repair consumes them. -- Treat verify/audit/dogfood runs as trace/eval material for improving the harness. -- Long-horizon consistency requires state transitions, completion signals, traceability, stop conditions, and evidence-backed closure. -- Do not add complex infrastructure until the file-backed graph proves insufficient. -``` diff --git a/distilled/DESIGN.md b/distilled/DESIGN.md index b3fd95c..093f349 100644 --- a/distilled/DESIGN.md +++ b/distilled/DESIGN.md @@ -74,6 +74,7 @@ 61. [Deliberate Subagent Contract](#d61---deliberate-subagent-contract) 62. [Repo-Native UI Proof Contract](#d62---repo-native-ui-proof-contract) 63. [Computed-First Control Map](#d63---computed-first-control-map) +64. [Work-Native Continuity Authority](#d64---work-native-continuity-authority) --- @@ -2919,6 +2920,55 @@ Posture compatibility is part of that closeout contract: `repo_closeout` and `ru - `closeout-report` is a compact replay/report helper, not `progress`, `verify`, milestone audit, release automation, cleanup, or a dashboard. The source CLI path includes full health diagnostics; the generated helper reports health availability as a typed warning if the full health builder is not present in that helper runtime. - Future health hardening can consume the same helper output for stricter reporting, but must avoid turning local annotations into product truth. +## D64 - Work-Native Continuity Authority + +**Decision (2026-06-29):** `.work` becomes the canonical continuity surface for `gsdd next` and future work-native state, while `.planning` remains readable legacy lifecycle input during migration and for existing workflows that still own their write paths. The authority used by routing, preflight, verification, and auto-gate packets must be explicit in machine-readable output. Repo policy, not Workspine itself, decides whether `.work` is committed or local-only; in the Workspine framework repo, `.work` is local dogfood/runtime state and should not be tracked. + +**Context:** +- PR #116 merged the first `gsdd next` / `.work` continuity slice, but the framework repo still had split truth: `.work/milestone` described a locally implemented continuity milestone while `.planning` still described v2.0.0 parallel orchestration and P65/P66. +- Old v2.0.0 parallel orchestration is too distant to migrate wholesale. Its useful ideas are partial extraction candidates: write ownership and closeout truth gates move into work-native auto-gate safety now; the PR #113 registry material remains parked for later extraction or rescue. +- OpenSpec's change/archive model and LeanSpec's compact spec guidance both reinforce that long-lived state needs clear ownership and small, reviewable truth surfaces rather than scattered status prose. +- The cited OpenAI and Anthropic agent-orchestration guidance favors manager-owned orchestration, explicit handoffs, guardrails, evaluator loops, and bounded tool authority rather than an unbounded agent loop. +- MCP and agent-tooling security guidance reinforce that tool/resource outputs and local memory surfaces are untrusted inputs unless their provenance, privacy posture, and authority boundary are explicit. + +**Decision:** +- Treat `.work` as the source of truth for `gsdd next` continuity routing, focus packets, work-native milestone state, decisions, questions, evidence pointers, dogfood findings, and bounded auto-gate state. +- Keep `.planning` readable as legacy lifecycle input. Existing `gsdd-plan`, `gsdd-execute`, `gsdd-verify`, audit, complete-milestone, phase-status, and helper write paths continue to work against `.planning` until each surface is deliberately bridged or migrated. +- Require routing and preflight output to report authority explicitly, using scoped values such as `work_milestone`, `planning`, or `blocked/conflict`; `.work` authority must not silently mask repo truth, PR truth, or unrelated `.planning` blockers. +- Preserve repo/control-map truth as the highest authority for branch, PR, worktree, dirty-state, and delivery claims. `.work` can carry intent and continuity; it cannot convert local prose into integrated repo truth. +- Define execute-until-gate as task-bounded automation, not session-bounded autonomy. Auto mode may run typed, reviewed, `type=auto` tasks and bounded verification/repair cycles only until a human gate, verification gap, repeated blocker, authority conflict, trust boundary, or scope expansion stops it. +- Keep milestone completion user-owned. `gsdd next` may route to completion approval, but it must not mark a milestone complete autonomously. +- Treat `.work` tracking as repo policy. Consumer projects may commit `.work` if that fits their privacy and collaboration model. The Workspine framework repo keeps `.work` local-only because it is dogfooding its own runtime state while developing the framework. +- Reframe the old v2.0.0 parallel-orchestration milestone as legacy input rather than the active next milestone. Preserve write-set ownership and closeout truth gates in the new work-native authority milestone; park PR #113/P65 registry material as extract-later. + +**Leverage:** +- Lost: the old clean "v2.0.0 parallel PR orchestration next" story and the convenience of treating `.planning` as the only lifecycle state root. +- Kept: repo-native files, plain markdown workflow contracts, existing `.planning` compatibility, computed-first control-map authority, plan/execute/verify separation, and human-owned completion. +- Gained: a clear continuity authority for `gsdd next`, explicit migration semantics, safer execute-until-gate foundations, repo-policy-driven tracking, and a typed authority boundary that prevents future auto mode from running on stale or contradictory state. + +**Evidence:** +- `bin/lib/next.mjs`, `bin/lib/work-context.mjs`, `bin/lib/lifecycle-preflight.mjs` +- `tests/gsdd.next.test.cjs` and `tests/phase.test.cjs` +- `README.md` for the public `.work` tracking policy +- PR #116, merged 2026-06-29 with merge commit `b91a138c42f2ab3ff7376317031208c7a716decd`: `https://github.com/PatrickSys/workspine/pull/116` +- PR #113, still open as the parked registry extraction candidate: `https://github.com/PatrickSys/workspine/pull/113` +- GSD comparison sources: `agents/_archive/gsd-roadmapper.md`, `agents/_archive/gsd-planner.md`, `agents/_archive/gsd-executor.md`, and `agents/_archive/gsd-verifier.md`. These preserve lifecycle rigor around requirements, plans, state, execution, and verification, but they do not define a separate work-native `next` authority root; GSDD preserves the rigor while moving agent-facing continuity into `.work`. +- OpenSpec docs: `https://openspec.dev/` +- LeanSpec docs: `https://www.lean-spec.dev/docs/guide/first-principles` +- OpenAI Agents SDK orchestration docs: `https://developers.openai.com/api/docs/guides/agents/orchestration` +- OpenAI guardrails and approvals docs: `https://developers.openai.com/api/docs/guides/agents/guardrails-approvals` +- OpenAI agent evals docs: `https://developers.openai.com/api/docs/guides/agent-evals` +- Anthropic agent engineering guidance: `https://www.anthropic.com/engineering/building-effective-agents` +- Anthropic long-running harness guidance: `https://www.anthropic.com/engineering/harness-design-long-running-apps` +- Model Context Protocol security guidance: `https://modelcontextprotocol.io/specification/2025-06-18/basic/security_best_practices` +- GitHub Copilot repository instructions docs: `https://docs.github.com/en/copilot/concepts/prompting/response-customization` + +**Consequences:** +- Future `gsdd next` and auto-gate work should start by reconciling `.work`, `.planning`, repo/control-map, and PR truth into one conservative next action or one explicit blocker. +- Future milestones must not say "P65 shipped, start P66" unless repo, PR, `.work`, and legacy `.planning` truth agree. +- Future auto mode must expose typed gates, loop guards, evidence requirements, stop reasons, and authority source in JSON. It must never rely on prose such as "continue autonomously" as execution permission. +- Future framework work should not add tracked `.work` runtime state by default. Durable product changes belong in source/design/workflow/test files; `.work` dogfood state remains local unless explicitly promoted. + --- ## Maintenance diff --git a/distilled/EVIDENCE-INDEX.md b/distilled/EVIDENCE-INDEX.md index 7839dc5..882ce02 100644 --- a/distilled/EVIDENCE-INDEX.md +++ b/distilled/EVIDENCE-INDEX.md @@ -515,6 +515,17 @@ - OpenSpec comparison: OpenSpec change archive flow does not own long-running multi-worktree local-state reconciliation as a portable harness surface - Harness sources: https://www.anthropic.com/engineering/harness-design-long-running-apps, https://code.claude.com/docs/en/worktrees, https://developers.openai.com/codex/cloud, https://developers.openai.com/api/docs/guides/agents/orchestration, https://developers.openai.com/api/docs/guides/agents/guardrails-approvals, https://developers.openai.com/api/docs/guides/agent-evals, https://docs.github.com/en/copilot/concepts/agents/cloud-agent/about-cloud-agent, https://agent-browser.dev/sessions, https://developer.chrome.com/docs/devtools/agents +## D64 — Work-Native Continuity Authority +- `bin/lib/next.mjs`, `bin/lib/work-context.mjs`, `bin/lib/lifecycle-preflight.mjs` +- `tests/gsdd.next.test.cjs`, `tests/phase.test.cjs` +- `README.md` for the public `.work` tracking policy +- PR #116 merged 2026-06-29, merge commit `b91a138c42f2ab3ff7376317031208c7a716decd`: https://github.com/PatrickSys/workspine/pull/116 +- PR #113 remains open as parked registry extraction input: https://github.com/PatrickSys/workspine/pull/113 +- GSD comparison sources: `agents/_archive/gsd-roadmapper.md`, `agents/_archive/gsd-planner.md`, `agents/_archive/gsd-executor.md`, `agents/_archive/gsd-verifier.md`; upstream GSD lifecycle rigor does not define a separate work-native `next` authority root, so GSDD keeps the lifecycle rigor while moving agent-facing continuity into `.work` +- Spec framework sources: https://openspec.dev/, https://www.lean-spec.dev/docs/guide/first-principles +- Orchestrator sources: https://developers.openai.com/api/docs/guides/agents/orchestration, https://www.anthropic.com/engineering/building-effective-agents +- Industry guidance sources: https://developers.openai.com/api/docs/guides/agents/guardrails-approvals, https://developers.openai.com/api/docs/guides/agent-evals, https://www.anthropic.com/engineering/harness-design-long-running-apps, https://modelcontextprotocol.io/specification/2025-06-18/basic/security_best_practices, https://docs.github.com/en/copilot/concepts/prompting/response-customization + --- ## Maintenance