diff --git a/CHANGELOG.md b/CHANGELOG.md index 2830f30..57f5ca9 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -6,6 +6,20 @@ Format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/), and Types of changes: **Added**, **Changed**, **Deprecated**, **Removed**, **Fixed**, **Security**. +## [Unreleased] + +### Added + +- Recovery protocol (`reference/recovery.md`): formal process for closing gaps in specification, tests, or coverage after code already exists. Covers discovery triggers, gap classification (spec gap, mold gap, coverage erosion, contract gap), severity-based triage, the five-step recovery sequence (audit, spec patch, mold patch, recast, re-review), kanban integration, recurrence prevention, and health metrics. +- Gap assessment template (`harness/templates/gap-assessment-template.md`): structured template for documenting discovered gaps during recovery. +- Recovery acknowledgment in `MANIFESTO.md` Section VII (The Fracture Lines): recognizes that gaps are inevitable and defines recovery as the methodology applied in reverse. +- Recovery procedure in `harness/PLAYBOOK.md`: expanded Iteration and Feedback section with recovery sequence, exit criteria, human and agent responsibilities, and kanban treatment. +- Agent rules R11 (halt and report gaps) and R12 (recovery task behavior) in `harness/agent/AGENT_RULES.md`. +- Retroactive gap discovery failure mode in `reference/agent-operating-model.md`. +- Retroactive Gap Discovery section in `reference/workflow.md` feedback loops. +- Recovery terms added to `reference/glossary.md`: contract gap, coverage erosion, gap assessment, mold gap, recovery, spec gap. +- Recovery constraint and terminology added to `harness/.cursor/rules/codex-automata.mdc`. + ## [0.1.0] - 2026-05-11 ### Added diff --git a/MANIFESTO.md b/MANIFESTO.md index e60ba76..2a83668 100644 --- a/MANIFESTO.md +++ b/MANIFESTO.md @@ -174,6 +174,14 @@ There is also the question of scale. A solo developer building a weekend project The deepest limitation is the cold start problem. Writing good specs requires domain knowledge, architectural taste, and the hard-won intuition that comes from years of building systems that failed. You cannot spec what you do not understand. Junior engineers cannot be dropped into the Spec Writing station and expected to produce sharp molds. They must first learn what good looks like, which means, paradoxically, they may need to write bad code, debug it, and internalize the failure modes before they can specify well. The methodology does not eliminate the need for experience. It concentrates experience where it matters most. +There is one more fracture line, subtler than the others because it lives inside the methodology itself. The pipeline assumes that specifications and tests are written before code. In practice, even disciplined teams discover gaps after the fact. A production incident reveals a failure mode that no one specified. A coverage audit exposes a module with tests that were disabled months ago and never restored. A new engineer walks through the codebase and finds behavior that exists in code but in no specification anywhere. + +The instinct in these moments is to fix the immediate problem. Add a test for the failure mode. Re-enable the disabled tests. Move on. This instinct is dangerous because it inverts the pipeline. A test written to match existing code is not a mold. It is a tracing. It encodes whatever the code happens to do, correct or not, and calls it specified. The gap in the specification remains, invisible but load-bearing. + +The correct response is the same response the methodology prescribes for forward work, applied in reverse. When a gap is discovered, trace it back to its root. If the specification is missing, write it. If the specification exists but the mold does not, derive the tests. If the mold existed but eroded, restore it from the specification, not from the code. Then verify or recast the implementation against the repaired mold. Recovery follows the pipeline. It simply enters at a different point. + +This matters because recovery is not an exception. It is a permanent feature of real systems. No methodology eliminates gaps entirely. What a methodology can do is define how gaps are found, classified, and closed with the same discipline that governs forward work. A system that can only build forward is fragile. A system that can also recover is resilient. The mold must be inspected and maintained, not just built once and trusted forever. + None of these limitations invalidate the approach. They define its scope. Codex Automata is a methodology for engineering production systems in an era when implementation is cheap and specification is the binding constraint. Within that scope, it is precise. --- diff --git a/README.md b/README.md index a4e7bc2..fb2bfef 100644 --- a/README.md +++ b/README.md @@ -103,7 +103,7 @@ codex-automata/ | | |-- hooks.json Event-driven enforcement | |-- .github/ GitHub CI and templates | |-- agent/ Detailed agent operating rules -| |-- templates/ All project templates +| |-- templates/ All project templates (spec, test, ADR, contract, task, review, gap assessment) | |-- docs/ Empty project docs directory | |-- tests/ Empty project tests directory | |-- tasks/ Empty agent tasks directory @@ -116,6 +116,7 @@ codex-automata/ | |-- architecture.md Architecture patterns and guidance | |-- kanban.md Flow-based project management | |-- agent-operating-model.md How agents operate +| |-- recovery.md Recovery protocol for closing gaps | |-- glossary.md Terminology reference | |-- examples/ WORKED EXAMPLES (read, don't copy) @@ -143,6 +144,20 @@ Specification --> Tests --> Code If the casting is defective, fix the mold. If the mold is wrong, fix the specification. Do not debug the implementation directly. +## When Gaps Are Discovered + +Real projects discover gaps after code exists: a production incident exposes an unspecified failure mode, a review reveals missing tests, or a new team member finds a module with no contract tests. Codex Automata defines a formal recovery protocol for these situations. + +Recovery follows the same pipeline as forward work, applied retroactively: + +1. **Audit** the gap using `templates/gap-assessment-template.md`. +2. **Patch the spec** from domain knowledge (not from the existing code). +3. **Patch the mold** by deriving tests from the specification. +4. **Recast** the implementation if the new tests fail. +5. **Re-review** the complete recovery unit. + +Recovery tasks are first-class kanban work items, not invisible tech debt. See `reference/recovery.md` for the full protocol. + ## How Agents Operate Agents working in a Codex Automata project follow strict rules (enforced by `.cursor/rules/` and `AGENTS.md`): diff --git a/ROADMAP.md b/ROADMAP.md index 7f61aa6..59aabf0 100644 --- a/ROADMAP.md +++ b/ROADMAP.md @@ -5,6 +5,7 @@ This document sketches planned evolution of the Codex Automata methodology harne ## Current - **Version 0.1.0**: Baseline harness: specification doctrine, molds and casting metaphors across docs, bounded context guidance, agent task boundaries, interface contract discipline, Cursor integration (rules, skills, subagents, hooks), templates, examples, GitHub-oriented quality gate patterns. +- **Recovery protocol**: Formal process for closing gaps in specification, tests, or coverage after code exists. Includes gap classification taxonomy, severity-based triage, recovery sequence (audit, spec patch, mold patch, recast, re-review), kanban integration, gap assessment template, recurrence prevention, and health metrics. Wired into manifesto, playbook, agent rules, Cursor rules, and glossary. ## v0.2.0 (goals) diff --git a/harness/.cursor/hooks.json b/harness/.cursor/hooks.json index d2a8054..9794f5a 100644 --- a/harness/.cursor/hooks.json +++ b/harness/.cursor/hooks.json @@ -4,7 +4,7 @@ "afterFileEdit": [ { "type": "prompt", - "prompt": "A source file was just edited. Verify: (1) Does a specification exist for the module this file belongs to? (2) Do tests exist that cover the behavior being changed? If either is missing, remind the user that Codex Automata requires specifications and tests before implementation. Do not block the edit, just note the gap if one exists. Here is the edit context: $ARGUMENTS", + "prompt": "A source file was just edited. Verify: (1) Does a specification exist for the module this file belongs to? (2) Do tests exist that cover the behavior being changed? If either is missing, remind the user that Codex Automata requires specifications and tests before implementation. If a gap exists, recommend creating a gap assessment using templates/gap-assessment-template.md to document it for triage and recovery. Do not block the edit, but clearly surface the gap and the recommended next step. Here is the edit context: $ARGUMENTS", "timeout": 15 } ] diff --git a/harness/.cursor/rules/codex-automata.mdc b/harness/.cursor/rules/codex-automata.mdc index 8ff0439..8229b34 100644 --- a/harness/.cursor/rules/codex-automata.mdc +++ b/harness/.cursor/rules/codex-automata.mdc @@ -16,15 +16,17 @@ This project follows the Codex Automata methodology: Documentation first, Tests 5. Prefer small, atomic commits traceable to specification sections. 6. Surface ambiguity instead of guessing. 7. Every task must map to a specification section and test case. +8. When you discover code without a corresponding specification or tests, halt and report the gap. Do not work around it. Recommend documenting it with `templates/gap-assessment-template.md` for triage and recovery. ## Workflow - Locate the specification before writing code. - Locate the test plan before implementing. -- If spec or tests are missing, stop and report before proceeding. +- If spec or tests are missing, stop and report before proceeding. Recommend a gap assessment for triage. +- During recovery tasks (closing gaps in existing code), derive specifications from domain knowledge and tests from specifications. Never derive specs or tests from the existing code. ## Terminology -Use these terms consistently: specification, mold, casting, bounded context, interface contract, quality gate, agent task, human review, flow. +Use these terms consistently: specification, mold, casting, bounded context, interface contract, quality gate, agent task, human review, flow, recovery, gap assessment, spec gap, mold gap, coverage erosion, contract gap. See `agent/AGENT_RULES.md` for the complete operating manual. diff --git a/harness/PLAYBOOK.md b/harness/PLAYBOOK.md index 8de9ccf..f0a9106 100644 --- a/harness/PLAYBOOK.md +++ b/harness/PLAYBOOK.md @@ -371,6 +371,75 @@ The pipeline is not strictly linear. Review can send work back to any earlier ph The key constraint is that backward movement always starts at the specification. If code is wrong, do not debug the implementation. Fix the spec, fix the tests, recast. +--- + +## Recovery: Closing Gaps After the Fact + +The forward pipeline assumes specs and tests exist before code. When you discover they do not, recovery applies the same pipeline retroactively. Recovery is not an exception or a side project. It is first-class work that flows through the same kanban stations as forward work. + +For the full recovery protocol, classification taxonomy, triage guidance, and metrics, see the [Recovery Protocol](https://github.com/0xhackerfren/Codex-Automata/blob/main/reference/recovery.md) in the Codex Automata repository. + +### When Recovery Applies + +Recovery applies whenever you discover that code exists without the upstream artifacts the methodology requires: + +- A module has no specification, or the specification is incomplete. +- A specification has no tests, or tests are too weak to constrain the implementation. +- Tests were deleted, disabled, or made flaky without remediation. +- A module boundary has no contract tests despite a defined interface contract. + +These gaps are discovered through production incidents, review findings, coverage audits, team walkthroughs, dependency upgrades, security scans, or agent-detected gaps during routine tasks. + +### Recovery Sequence + +Recovery mirrors the forward pipeline but starts from an existing codebase. + +```text +Audit --> Spec Patch --> Mold Patch --> Recast (if needed) --> Re-review +``` + +**Step 1: Audit.** Document the gap using `templates/gap-assessment-template.md`. Identify the affected module, gap class, discovery trigger, severity, and current state. This is a human task; agents assist with evidence gathering. + +**Step 2: Spec Patch.** If the specification is missing or incomplete, write the missing sections following Phase 2 rules. Derive the specification from domain knowledge and stakeholder intent, not from the existing code. The code may be accidentally correct or silently wrong. + +**Step 3: Mold Patch.** Derive tests from the patched specification following Phase 3 rules. Tests must trace back to specification sections. For coverage erosion, compare against the original test intent via version control history before restoring. + +**Step 4: Recast (if needed).** If the existing implementation passes the new tests, no recast is needed. If it fails, recast following Phase 4 rules. Agents receive the specification, updated tests, and interface contracts. + +**Step 5: Re-review.** A human reviews the recovery as a unit: spec patch, mold patch, and any recast code. The review confirms spec accuracy, test traceability, implementation correctness, and that no new gaps were introduced. + +### Recovery Exit Criteria + +- [ ] The gap assessment document is complete. +- [ ] The specification is updated and covers the previously missing behavior. +- [ ] Tests exist for every specified behavior and trace to specification sections. +- [ ] All tests pass. +- [ ] A human has reviewed and approved the recovery unit. +- [ ] The recurrence prevention section of the gap assessment is filled in. + +### Human Responsibilities During Recovery + +- Triage discovered gaps by severity and schedule them on the board. +- Write or approve specification patches. Specification authority remains human-owned. +- Review the complete recovery unit before closing the card. +- Fill in the recurrence prevention section: what process gap allowed this debt to accumulate? + +### Agent Responsibilities During Recovery + +- When a gap is discovered during routine work, halt and report it using the gap assessment template. Do not silently work around gaps. +- During recovery tasks, follow the same forward rules (R1-R10) in the context of an existing codebase. +- Assist with evidence gathering: scan for related gaps, check version control history, surface specification sections that reference the affected behavior. +- Derive tests from the specification, not from the existing code. +- If the specification appears incorrect (code behavior contradicts it and the code is believed correct), surface the conflict for human resolution. Do not update the specification unilaterally. + +### Recovery on the Kanban Board + +Recovery cards use a distinct card type or tag ("Recovery" or "Gap Remediation") and flow through the same stations as forward work. They count against the same WIP limits. If a critical recovery card displaces forward work, that tradeoff is visible on the board. + +Batch related gaps within a single module into one recovery card. Create separate cards for separate modules to maintain bounded context independence. + +--- + ## Quick Reference | Phase | Primary Owner | Bottleneck? | Key Template | @@ -382,3 +451,4 @@ The key constraint is that backward movement always starts at the specification. | 4: Code Casting | Agent | No | `agent-task-template.md` | | 5: Review | Human | **Yes** | `human-review-template.md` | | 6: Deployment | Human + Agent | No | N/A | +| Recovery | Human + Agent | No | `gap-assessment-template.md` | diff --git a/harness/agent/AGENT_RULES.md b/harness/agent/AGENT_RULES.md index ddb0d4a..32abf15 100644 --- a/harness/agent/AGENT_RULES.md +++ b/harness/agent/AGENT_RULES.md @@ -30,6 +30,10 @@ R9. Every agent task must map to a specification section and at least one test c R10. Do not introduce external dependencies not specified in the architecture documents. +R11. When you discover code without a corresponding specification or tests, halt and report the gap. Do not silently work around it, do not write tests derived from the code, and do not treat unspecified behavior as intentional. Report the gap using the gap assessment template (`templates/gap-assessment-template.md`) so a human can triage and schedule recovery. + +R12. During recovery tasks, follow rules R1-R11 in the context of an existing codebase. Derive specifications from domain knowledge and stakeholder intent, not from the current implementation. Derive tests from the specification, not from the code. If the specification and the code conflict, surface the conflict for human resolution. + ## 3. Task Execution Protocol - Step 1: Read the agent task definition and locate the specification reference. diff --git a/harness/templates/gap-assessment-template.md b/harness/templates/gap-assessment-template.md new file mode 100644 index 0000000..caf1ba6 --- /dev/null +++ b/harness/templates/gap-assessment-template.md @@ -0,0 +1,99 @@ +# Gap Assessment: [Module Name] + + + +## Metadata + +| Field | Value | +|-------|-------| +| Module name | [ ] | +| Assessed by | [ ] | +| Date discovered | [YYYY-MM-DD] | +| Severity | Critical / Significant / Moderate / Low [pick one] | +| Status | Open / In Recovery / Closed [pick one] | + +## Gap Classification + +[Pick one. Definitions: spec gap = behavior in code but not in spec; mold gap = behavior in spec but no tests; coverage erosion = tests lost over time; contract gap = boundary lacks contract tests.] + +- [ ] **Spec gap:** behavior exists in code but is not documented in any specification. +- [ ] **Mold gap:** behavior is documented in the specification but has no tests, or tests are too weak. +- [ ] **Coverage erosion:** tests existed but were deleted, disabled, or made flaky without remediation. +- [ ] **Contract gap:** a module boundary has no contract tests despite a defined interface contract. + +## Discovery Trigger + +[How was this gap found? Pick one or describe.] + +- [ ] Production incident +- [ ] Review finding +- [ ] Coverage audit +- [ ] Team walkthrough +- [ ] Dependency upgrade +- [ ] Security scan +- [ ] Agent-detected during routine task +- [ ] Other: [ ] + +[If incident-related, link the incident report or postmortem.] + +## Current State + +[What exists today? Describe the specification, tests, and code as they currently stand.] + +- **Specification:** [complete / partial / missing; cite relevant spec document and sections if they exist] +- **Tests:** [present / partial / missing / disabled; cite test files if they exist] +- **Code:** [describe the behavior that exists without adequate upstream artifacts] +- **Contract tests:** [present / missing; cite the interface contract document] + +## Required State + +[What should exist according to the methodology? Be specific.] + +- **Specification should cover:** [list the behaviors, edge cases, and failure modes that need to be specified] +- **Tests should cover:** [list the test cases that need to exist, traced to specification sections] +- **Implementation changes (if any):** [describe expected recast scope, or note "none expected" if the code is believed correct] + +## Recovery Plan + +[Specific tasks to close this gap. Each task should map to a recovery sequence step.] + +1. **Spec patch:** [ ] +2. **Mold patch:** [ ] +3. **Recast (if needed):** [ ] +4. **Re-review:** [ ] + +**Estimated effort:** [ ] + +**Assigned to:** [ ] + +## Recurrence Prevention + +[What process gap allowed this debt to accumulate? What change prevents this class of gap from recurring?] + +- **Root cause:** [ ] +- **Process change:** [ ] + + add the check to the review and PR templates. +- Contract tests were not required at the boundary --> update architecture and module boundary templates. +- Spec was reviewed by someone unfamiliar with the domain --> establish domain-owner review requirements. +- Tests were disabled to unblock a deadline --> require tracking issues with due dates for disabled tests. +- Module predates the methodology --> schedule systematic audit of pre-methodology modules. +- Emergency bypass of the forward pipeline --> ensure bypass policy includes mandatory recovery scheduling. +--> + +## Resolution + +[Fill in when the gap is closed.] + +| Field | Value | +|-------|-------| +| Date closed | [YYYY-MM-DD] | +| Reviewed by | [ ] | +| Spec patch commit/PR | [ ] | +| Mold patch commit/PR | [ ] | +| Recast commit/PR | [N/A or link] | +| Review approval | [ ] | diff --git a/reference/agent-operating-model.md b/reference/agent-operating-model.md index da14de7..1d3dad6 100644 --- a/reference/agent-operating-model.md +++ b/reference/agent-operating-model.md @@ -110,6 +110,8 @@ Human review still judges specification fidelity, operational readiness, and cro **Security sensitive inference gaps.** Prefer conservative halts and human security review; never guess trust boundaries. +**Retroactive gap discovery.** During routine tasks, an agent encounters code that has no corresponding specification, no tests, disabled tests, or no contract tests at a module boundary. The agent halts the current task for the affected scope and files a structured gap report using the gap assessment template. The report includes the affected module, the class of gap (spec gap, mold gap, coverage erosion, or contract gap), how it was discovered, and what currently exists. The agent does not work around the gap, does not write tests derived from the code, and does not treat unspecified behavior as intentional. Humans triage the gap by severity and schedule recovery work on the kanban board. During recovery tasks, agents follow the forward pipeline rules (specification first, then molds, then casting) applied to the existing codebase: specifications are derived from domain knowledge, not from code; tests are derived from specifications, not from code; and conflicts between code and specification are surfaced for human resolution. See `recovery.md` in this directory for the full recovery protocol. + Blameless postmortems should link incidents to missing specification clauses, mold gaps, gate holes, training updates, or ADR follow-ups. -Companion references: `principles.md`, `workflow.md`, `kanban.md`, `architecture.md`, and `glossary.md` in this directory, and `PLAYBOOK.md` in the harness for detailed phase guidance. +Companion references: `principles.md`, `workflow.md`, `kanban.md`, `architecture.md`, `recovery.md`, and `glossary.md` in this directory, and `PLAYBOOK.md` in the harness for detailed phase guidance. diff --git a/reference/glossary.md b/reference/glossary.md index b62ac20..d5393c9 100644 --- a/reference/glossary.md +++ b/reference/glossary.md @@ -11,12 +11,21 @@ An independent partition of the system where a specific domain model applies; th **Casting** The implementation code produced by satisfying a mold derived from a specification; treated as a commodity artifact. Elaboration: defects often indicate an underspecified mold or specification drift rather than a need for ad hoc test bending. +**Contract Gap** +A gap class where a module boundary defined in the architecture has no contract tests despite a documented interface contract. Elaboration: the interface contract document may exist, but nothing mechanically verifies that both sides honor it; recovery writes contract tests from the interface contract and runs them against both sides. + +**Coverage Erosion** +A gap class where tests once existed but were deleted, disabled, marked as skipped, or allowed to become flaky without remediation. Elaboration: the mold has degraded over time; recovery restores or rewrites tests from the current specification, not from the original test code, since the specification may have evolved. + **Code Casting** The phase where agents write or refactor implementation until all relevant molds pass under the governing interface contracts. Elaboration: parallelizes across contexts when contracts and repository ownership boundaries are clear. **Flow** The continuous movement of work through the pipeline, managed with kanban mechanics and WIP limits. Elaboration: measured with cycle time, throughput, and WIP age rather than sprint velocity metaphors. +**Gap Assessment** +A structured document recording a discovered gap in specification, tests, or coverage for an existing module. Elaboration: captures the affected module, gap class, discovery trigger, severity, current and required state, recovery plan, and recurrence prevention; uses `templates/gap-assessment-template.md` in the harness. + **Human Review** The phase where humans verify that implementation matches specification, intent, and systemic risk expectations beyond what automated checks encode. Elaboration: may reject work and return it upstream to specification, molding, or casting with explicit rationale. @@ -26,9 +35,18 @@ A formal agreement between bounded contexts defining the API surface, data forma **Mold** The executable test suite and fixtures derived from a specification that define the exact shape implementation must take. Elaboration: rigid by design; changing a mold without a specification change is a process violation unless governed as an emergency fix with follow up specification alignment. +**Mold Gap** +A gap class where behavior is documented in the specification but has no tests, or the tests are too weak to constrain the implementation meaningfully. Elaboration: the specification says what should happen, but no mold enforces it; recovery derives tests from the existing specification following standard test molding rules. + **Quality Gate** An automated check in the CI/CD pipeline that enforces process discipline mechanically (build, test, policy, security, signing, compatibility, performance budgets as applicable). Elaboration: failures block progression by default; waivers require explicit human risk acceptance tied to records. +**Recovery** +The process of closing gaps in specification, tests, or coverage after code already exists. Elaboration: follows the same pipeline as forward work (specification first, then molds, then casting) but applied retroactively; recovery tasks are first-class kanban work items, not invisible background chores; see `recovery.md` in this directory for the full protocol. + +**Spec Gap** +A gap class where behavior exists in the codebase but is not documented in any specification. Elaboration: the most dangerous class because without a specification you cannot determine whether the current behavior is correct or accidental; recovery writes the specification first from domain knowledge, not from the code. + **Spec Writing** The phase where humans produce or revise specifications and contracts; typically the first human constrained station in the flow. Elaboration: WIP limits preserve depth and reduce ambiguous drafts that would pollute molding and casting. diff --git a/reference/recovery.md b/reference/recovery.md new file mode 100644 index 0000000..b5eeb44 --- /dev/null +++ b/reference/recovery.md @@ -0,0 +1,207 @@ +# Recovery Protocol + +This document defines how Codex Automata projects handle the discovery of gaps in specifications, tests, or coverage after code already exists. Recovery is not an exception to the methodology; it is the methodology applied retroactively. The same pipeline that governs forward work governs remediation: specification first, then molds, then casting. + +For phase-by-phase forward workflow, see `PLAYBOOK.md` in the harness. For the gap assessment template used during recovery, see `templates/gap-assessment-template.md` in the harness. + +## Why Recovery Needs a Protocol + +The forward pipeline assumes disciplined execution: every module gets a spec, every spec gets tests, every test precedes code. In practice, gaps accumulate. Production incidents expose unspecified failure modes. Code reviews reveal missing edge-case tests. A new team member walks through a module and finds no contract tests at all. A dependency upgrade invalidates assumptions that were never documented. A security scan surfaces behaviors that exist in code but nowhere in the specification. + +Without a structured recovery process, teams respond to these discoveries in one of two ways. They either ignore the gap and accept silent debt, or they slap a test onto the existing code and call it covered. Both responses violate the methodology. The first erodes the mold. The second builds a mold around a casting instead of a specification, encoding whatever the code happens to do rather than what it should do. + +Recovery requires the same rigor as forward work because the risks are identical. A gap in the mold is a gap in the mold whether it was missed on the first pass or discovered six months later. + +## Discovery Triggers + +Gaps surface through predictable channels. Teams should treat these as detection mechanisms, not surprises. + +**Production incidents.** An outage, data corruption event, or SLO breach reveals behavior that was never specified or tested. The incident postmortem traces the failure to a missing specification clause or an absent test. + +**Review findings.** A human reviewer identifies behaviors in the casting that have no corresponding specification section or test coverage. The review template's "Gaps" field captures these explicitly. + +**Coverage audits.** Periodic or automated measurement of test coverage (statement, branch, behavioral) reveals modules below project thresholds or with no coverage at all. + +**Team walkthroughs.** A new team member, a rotating reviewer, or a cross-team dependency audit exposes modules where institutional knowledge substitutes for written specification. + +**Dependency upgrades.** Updating a library, framework, or platform surfaces assumptions that were encoded in code but never documented in a specification. The upgrade breaks behavior that no test guards. + +**Security scans.** Static analysis, penetration testing, or threat modeling identifies attack surfaces that exist in code but have no corresponding failure-mode specification or adversarial test. + +**Agent-detected gaps.** During routine tasks, agents encounter code without a corresponding specification or tests. Per agent operating rules, they halt and report rather than working around the gap. + +## Gap Classification + +Every discovered gap falls into one of four categories. Classification determines the recovery sequence. + +### Spec Gap + +Behavior exists in the codebase but is not documented in any specification. The code does something, but no one wrote down what it should do or why. This is the most dangerous class because without a spec, you cannot determine whether the current behavior is correct or accidental. + +**Recovery sequence:** Write the specification first (determine intended behavior), then derive tests, then verify or recast the implementation. + +### Mold Gap + +Behavior is documented in the specification but has no tests, or tests exist but are too weak to constrain the implementation meaningfully. The spec says what should happen, but no mold enforces it. + +**Recovery sequence:** Derive tests from the existing specification, verify they fail against known-bad inputs (or pass against the current implementation if it is believed correct), then review for alignment. + +### Coverage Erosion + +Tests once existed but were deleted, disabled, marked as skipped, or allowed to become flaky without remediation. The mold has degraded over time. + +**Recovery sequence:** Audit the erosion (version control history reveals when and why tests were removed or disabled), restore or rewrite the affected tests, and verify they align with the current specification. + +### Contract Gap + +A module boundary defined in the architecture has no contract tests. The interface contract document may exist, but nothing mechanically verifies that both sides honor it. + +**Recovery sequence:** Write contract tests from the interface contract document, run them against both sides of the boundary, and remediate any failures. + +## Triage + +Not all gaps carry equal risk. Triage determines the order in which recovery work enters the kanban board. + +### Severity Levels + +**Critical.** The gap is in a production-facing code path and has already caused or could plausibly cause an incident. Recovery enters the board immediately, ahead of forward work if WIP limits require a choice. + +**Significant.** The gap is in a production-facing code path but has not yet caused an incident. The behavior is exercised regularly. Recovery is scheduled within the current planning cycle. + +**Moderate.** The gap is in internal tooling, rarely exercised paths, or code with partial coverage. Recovery is scheduled but does not preempt forward work. + +**Low.** The gap is theoretical (the unspecified behavior is unlikely to be exercised) or the module is scheduled for replacement. Recovery is documented and tracked but may be deferred. + +### Triage Decisions + +Triage is a human responsibility. Agents surface gaps and provide evidence; humans decide priority. The triage decision is recorded in the gap assessment document so future reviewers understand why a gap was or was not addressed immediately. + +When multiple gaps are discovered simultaneously (common after an incident or a comprehensive audit), triage them as a batch. Group related gaps by module or bounded context to enable efficient recovery rather than scattering effort across the system. + +## The Recovery Sequence + +Recovery follows the forward pipeline but starts from an existing codebase. The sequence is: + +```text +Audit --> Spec Patch --> Mold Patch --> Recast (if needed) --> Re-review +``` + +### Step 1: Audit + +Document the gap using the gap assessment template (`templates/gap-assessment-template.md`). Identify: + +- Which module is affected. +- What class of gap it is (spec, mold, coverage erosion, contract). +- How it was discovered. +- What currently exists (partial spec? weak tests? nothing?). +- What should exist according to the methodology. + +The audit is a human task. Agents assist by scanning for related gaps in adjacent modules, checking version control history for when coverage was lost, and surfacing specification sections that reference the affected behavior. + +### Step 2: Spec Patch + +If the specification is missing or incomplete, write the missing sections. This follows the same rules as Phase 2 (Specification Writing) in the forward pipeline. The specification describes intended behavior, edge cases, and failure modes. + +If the specification exists and is correct, skip to Step 3. + +If the specification exists but is wrong (the code does something different, and the code is correct), update the specification to match intended behavior and document the rationale for the change. + +The critical constraint: do not derive the specification from the code. Derive the specification from domain knowledge, stakeholder intent, and architectural requirements. The code may be accidentally correct, or it may be wrong. The specification must reflect what the system should do, not what it happens to do. + +### Step 3: Mold Patch + +Derive tests from the patched specification. This follows the same rules as Phase 3 (Test Molding) in the forward pipeline. Tests must trace back to specification sections. Tests must be sharp enough to constrain implementation. + +For mold gaps, verify that new tests fail against known-bad inputs before relying on them. + +For coverage erosion, compare restored tests against the original test intent (version control history) and the current specification. The specification is authoritative; if the original tests were wrong, write new ones rather than restoring incorrect tests. + +### Step 4: Recast (If Needed) + +If the existing implementation passes the new tests, recasting is unnecessary. The code was correct; it was just unverified. + +If the existing implementation fails the new tests, recast the affected code. This follows Phase 4 (Code Casting) rules. Agents receive the specification, the updated test suite, and the interface contracts. They modify the implementation until all tests pass. + +Do not patch the implementation to pass tests without reading the specification. The specification, not the test, is the source of truth. The test encodes the specification mechanically; the implementation satisfies both. + +### Step 5: Re-review + +A human reviews the recovery work as a unit: the spec patch, the mold patch, and any recast code. The review verifies: + +- The specification accurately reflects intended behavior. +- Tests trace to specification sections. +- The implementation passes all tests. +- No new gaps were introduced by the recovery work. +- The gap assessment document is complete, including the recurrence prevention section. + +## Kanban Integration + +Recovery tasks are first-class work items on the kanban board. They are not invisible background chores, side projects, or "tech debt we will get to later." + +### Card Type + +Use a distinct card type or tag (e.g., "Recovery" or "Gap Remediation") so recovery work is visible and measurable separately from forward work. This enables tracking of recovery volume as a system health metric. + +### Station Flow + +Recovery cards flow through the same stations as forward work: + +| Station | Recovery Activity | +|---------|-------------------| +| Spec Writing | Write or patch the missing specification sections | +| Test Molding | Derive or restore tests from the specification | +| Code Casting | Recast if the implementation fails new tests | +| Human Review | Review the complete recovery unit | + +### WIP Limits + +Recovery cards count against the same WIP limits as forward cards. If a critical gap displaces forward work, that displacement is visible on the board and subject to the same flow management. Recovery is not free; treating it as invisible is how gaps accumulate in the first place. + +### Batch Sizing + +When an audit discovers many gaps in a single module, batch them into a single recovery card scoped to that module rather than creating dozens of micro-cards. The recovery unit is the module, not the individual test. + +When gaps span multiple modules, create one recovery card per module to maintain bounded context independence. + +## Recurrence Prevention + +Every completed recovery includes a brief retrospective answering one question: what process gap allowed this debt to accumulate? + +Common answers and their remediation: + +**The review checklist did not require coverage verification.** Add the check to the human review template and the PR template. + +**Contract tests were not required at the boundary.** Update the architecture document and the module boundary template to mandate contract tests. + +**The spec was reviewed by someone unfamiliar with the domain.** Establish domain-owner review requirements for specifications in that bounded context. + +**Tests were disabled to unblock a deadline and never re-enabled.** Add a policy that disabled tests require a tracking issue with a due date and an owner. + +**The module predates the methodology and was never retroactively specified.** Schedule a systematic audit of pre-methodology modules and prioritize them by production exposure. + +**Agent-generated code bypassed the forward pipeline due to an emergency.** Review the emergency bypass policy and ensure it includes mandatory recovery scheduling. + +The recurrence prevention section of the gap assessment is not optional. It is the mechanism by which recovery improves the system rather than simply patching the current gap. + +## Metrics + +Track recovery as a system health indicator, not a shame metric. + +**Gap discovery rate.** How many gaps are discovered per period? A rising rate after adopting the methodology may indicate improving detection rather than worsening quality. A sustained high rate after the system matures signals a process failure. + +**Recovery cycle time.** How long from gap discovery to closed recovery card? Long cycle times indicate that recovery work is being deprioritized or blocked. + +**Gap class distribution.** Which classes of gap (spec, mold, erosion, contract) appear most frequently? Persistent patterns in one class indicate a systemic weakness in the corresponding pipeline phase. + +**Recurrence rate.** Are the same classes of gap reappearing after recovery? If so, the recurrence prevention measures are not working. + +**Recovery-to-forward ratio.** What fraction of board capacity is consumed by recovery work? A high ratio sustained over time indicates that the forward pipeline is producing gaps faster than the team can close them. This is an architectural or process problem, not a staffing problem. + +## Companion Documents + +- `workflow.md` in this directory for end-to-end pipeline context and feedback loops. +- `kanban.md` in this directory for WIP limits, pull policies, and flow metrics. +- `agent-operating-model.md` in this directory for agent behavior during recovery tasks. +- `principles.md` in this directory for the foundational principles recovery enforces. +- `PLAYBOOK.md` in the harness for phase-by-phase execution guidance including the recovery procedure. +- `templates/gap-assessment-template.md` in the harness for the structured gap documentation template. diff --git a/reference/workflow.md b/reference/workflow.md index 1f8953e..5d4eb99 100644 --- a/reference/workflow.md +++ b/reference/workflow.md @@ -96,6 +96,30 @@ Deploy / Observe ----------------------> feedback to Spec / Contracts / Architec Avoid bypass loops where pressure forces gate weakening without specification amendment; that forfeits Mechanical Discipline emphasized in Codex Automata doctrine. +## Retroactive Gap Discovery + +The feedback loops above address gaps found during active review or production observation of recently shipped work. A separate class of discovery occurs when gaps are found in code that has already been accepted: modules with missing or incomplete specifications, behaviors with no test coverage, eroded molds from disabled or deleted tests, and module boundaries with no contract tests. + +These retroactive gaps follow a dedicated recovery protocol. The full protocol, including classification taxonomy, triage guidance, kanban integration, and recurrence prevention, is defined in `recovery.md` in this directory. The summary sequence is: + +```text +Audit --> Spec Patch --> Mold Patch --> Recast (if needed) --> Re-review +``` + +**Audit.** Document the gap using the gap assessment template. Classify it as a spec gap (behavior exists without specification), mold gap (specification exists without tests), coverage erosion (tests were lost over time), or contract gap (boundary lacks contract tests). + +**Spec Patch.** Write or correct the specification from domain knowledge, not from the existing code. The code may be accidentally correct or silently wrong. + +**Mold Patch.** Derive tests from the patched specification following the same test molding rules as forward work. + +**Recast.** If the implementation fails the new tests, recast the affected code. If it passes, the code was correct but unverified. + +**Re-review.** A human reviews the complete recovery unit: spec patch, mold patch, and any recast. + +Recovery tasks enter the kanban board as first-class work items with a distinct card type. They flow through the same stations and count against the same WIP limits as forward work. Treating recovery as invisible background work is how gaps accumulate. + +Discovery triggers include production incidents, review findings, coverage audits, team walkthroughs, dependency upgrades, security scans, and agent-detected gaps during routine tasks. See `recovery.md` for the complete list and triage criteria. + ## Kanban Stations Alignment Operational tracking mirrors stations: @@ -114,4 +138,4 @@ Consult `PLAYBOOK.md` (in the harness) when expanding any station into granular ## Companion Documents -For pull policies, limits, bottleneck interpretation, Toyota Production System parallels, metric guidance, consult `kanban.md` in this directory. For vocabulary alignment across teams, consult `glossary.md` in this directory. +For the recovery protocol covering retroactive gap discovery, classification, triage, and remediation, consult `recovery.md` in this directory. For pull policies, limits, bottleneck interpretation, Toyota Production System parallels, metric guidance, consult `kanban.md` in this directory. For vocabulary alignment across teams, consult `glossary.md` in this directory.