From ff92a53d96680c7b64cd40c31247de780ec46bd3 Mon Sep 17 00:00:00 2001 From: Buyi-wsgzg Date: Thu, 14 May 2026 18:30:12 +0800 Subject: [PATCH 1/5] Align gen-plan AC contract with RLCR review --- commands/gen-plan.md | 63 +++++++++++++++--- .../codex/full-alignment-review.md | 11 ++-- prompt-template/codex/regular-review.md | 9 ++- prompt-template/plan/gen-plan-template.md | 27 ++++++++ tests/test-gen-plan.sh | 65 +++++++++++++++++++ 5 files changed, 159 insertions(+), 16 deletions(-) diff --git a/commands/gen-plan.md b/commands/gen-plan.md index 3b97435e..43280665 100644 --- a/commands/gen-plan.md +++ b/commands/gen-plan.md @@ -266,6 +266,10 @@ After Claude candidate plan v1 is ready, run iterative challenge/refine rounds w "${CLAUDE_PLUGIN_ROOT}/scripts/ask-codex.sh" "" ``` - Prompt MUST include current candidate plan, prior disagreements, and unresolved items + - Prompt MUST include the RLCR plan contract: `AC-*` items are current RLCR completion gates; deferred, future, out-of-scope, post-work, or successor-loop goals must be represented as `FUT-*` under `## Future Work / Out of Scope`, optionally with a current-loop handoff AC. + - Prompt MUST require Codex to inspect each AC for deferral semantics. If any AC claims the real work happens outside this RLCR loop, Codex MUST put it under `REQUIRED_CHANGES`, not `OPTIONAL_IMPROVEMENTS`. + - Prompt MUST require a hard keyword scan within each AC body. Treat these strings as blocking unless the AC follows the Handoff AC Pattern and the current-loop verification is complete without performing future work: `TODO`, `TBD`, `deferred`, `future`, `follow-up`, `subsequent`, `next phase`, `next iteration`, `next milestone`, `next loop`, `v2`, `v.next`, `Phase II`, `left for`, `to be implemented in`, `see FUT-`. + - Prompt MUST require AC/Task bidirectional coverage: every `AC-*` is targeted by at least one Task Breakdown row; every Task Breakdown row targets at least one current-scope `AC-*`; no task target may be empty, `-`, `FUT-*`, or `DEC-*`. - Require output format: - `AGREE:` points accepted as reasonable - `DISAGREE:` points considered unreasonable and why @@ -384,6 +388,7 @@ Deeply think and generate the plan.md following these rules: ## Acceptance Criteria Following TDD philosophy, each criterion includes positive and negative tests for deterministic verification. +`AC-*` items are current RLCR completion gates: they must describe work that this implementation loop must complete and verify. Do not encode deferred, future, out-of-scope, post-work, or successor-loop goals as `AC-*`. - AC-1: - Positive Tests (expected to PASS): @@ -400,6 +405,21 @@ Following TDD philosophy, each criterion includes positive and negative tests fo - Negative Tests: <...> ... +### Handoff AC Pattern + +Use this pattern only when the draft contains a legitimate future goal that must be preserved without making it part of the current RLCR completion gate. + +- AC-X: Handoff for is complete without performing the future work. + - Future Work Reference: FUT-Y + - Positive Tests (expected to PASS): + - + - + - + - Negative Tests (expected to FAIL): + - + - + - + ## Path Boundaries Path boundaries define the acceptable range of implementation quality and choices. @@ -450,11 +470,22 @@ Each task must include exactly one routing tag: - `coding`: implemented by Claude - `analyze`: executed via Codex (`/humanize:ask-codex`) +Every `AC-*` must be covered by at least one task. Every task must target at least one `AC-*`. Do not target `FUT-*`, `DEC-*`, or `-` in the Target AC column. + | Task ID | Description | Target AC | Tag (`coding`/`analyze`) | Depends On | |---------|-------------|-----------|----------------------------|------------| | task1 | <...> | AC-1 | coding | - | | task2 | <...> | AC-2 | analyze | task1 | +## Future Work / Out of Scope + +Future, deferred, post-work, successor-loop, and out-of-scope items belong here, not under `## Acceptance Criteria`. + +- FUT-1: + - Source DEC: DEC-1 + - Current-loop handoff: AC-X + - Promotion trigger: + ## Claude-Codex Deliberation ### Agreements @@ -510,23 +541,33 @@ When `alternative_plan_language` is empty, absent, set to `"English"`, or set to 5. **AC Format**: All acceptance criteria must use AC-X or AC-X.Y format. -6. **Clear Dependencies**: Show what depends on what, not when things happen. +6. **Current-Scope AC Contract**: `AC-*` items are the current RLCR completion gate. Do NOT create deferred ACs. Any deferred, future, out-of-scope, post-work, successor-task, or successor-loop goal must be written as `FUT-*` under `## Future Work / Out of Scope`, optionally linked to a current-loop Handoff AC. + +7. **Deferred AC Keyword Guard**: Before finalizing, scan each AC body for deferral markers: `TODO`, `TBD`, `deferred`, `future`, `follow-up`, `subsequent`, `next phase`, `next iteration`, `next milestone`, `next loop`, `v2`, `v.next`, `Phase II`, `left for`, `to be implemented in`, and `see FUT-`. If any marker means the AC's real work is outside this loop, rewrite the item as a current-loop handoff AC plus a `FUT-*` item, or move it entirely to future work. + +8. **Handoff AC Pattern**: When preserving a future goal, write a current-loop AC only for the handoff state/artifact/documentation. The handoff AC may reference `FUT-*`, but its positive and negative tests must be fully verifiable in this loop and must not require completing the future work. + +9. **AC/Task Bidirectional Coverage**: Every `AC-*` must be covered by at least one Task Breakdown row. Every Task Breakdown row must target at least one current-scope `AC-*`. No row may use an empty target, `-`, `FUT-*`, or `DEC-*` as its Target AC. + +10. **DEC/FUT Linkage**: If a resolved decision defers work, the decision resolution must explicitly reference a `FUT-*` item. Each `FUT-*` item caused by a decision must include `Source DEC: DEC-N`. If there is a current-loop handoff, both the DEC and FUT entry should reference the handoff AC. + +11. **Clear Dependencies**: Show what depends on what, not when things happen. -7. **TDD-Style Tests**: Each acceptance criterion MUST include both positive tests (expected to pass) and negative tests (expected to fail). This follows Test-Driven Development philosophy and enables deterministic verification. +12. **TDD-Style Tests**: Each acceptance criterion MUST include both positive tests (expected to pass) and negative tests (expected to fail). This follows Test-Driven Development philosophy and enables deterministic verification. -8. **Affirmative Path Boundaries**: Describe upper and lower bounds using affirmative language (what IS acceptable) rather than negative language (what is NOT acceptable). +13. **Affirmative Path Boundaries**: Describe upper and lower bounds using affirmative language (what IS acceptable) rather than negative language (what is NOT acceptable). -9. **Respect Deterministic Designs**: If the draft specifies a fixed approach with no choices, reflect this in the plan by narrowing the path boundaries to match the user's specification. +14. **Respect Deterministic Designs**: If the draft specifies a fixed approach with no choices, reflect this in the plan by narrowing the path boundaries to match the user's specification. -10. **Code Style Constraint**: The generated plan MUST include a section or note instructing that implementation code and comments should NOT contain plan-specific progress terminology such as "AC-", "Milestone", "Step", "Phase", or similar workflow markers. These terms belong in the plan document, not in the resulting codebase. +15. **Code Style Constraint**: The generated plan MUST include a section or note instructing that implementation code and comments should NOT contain plan-specific progress terminology such as "AC-", "Milestone", "Step", "Phase", or similar workflow markers. These terms belong in the plan document, not in the resulting codebase. -11. **Draft Completeness Requirement**: The generated plan MUST incorporate ALL information from the input draft document without omission. The draft represents the most valuable human input and must be fully preserved. Any clarifications obtained through Phase 6 should be added incrementally to the draft's original content, never replacing or losing any original requirements. The final plan must be a superset of the draft information plus all clarified details. +16. **Draft Completeness Requirement**: The generated plan MUST incorporate ALL information from the input draft document without omission. The draft represents the most valuable human input and must be fully preserved. Any clarifications obtained through Phase 6 should be added incrementally to the draft's original content, never replacing or losing any original requirements. The final plan must be a superset of the draft information plus all clarified details. -12. **Debate Traceability**: The plan MUST include Codex-first findings, Claude/Codex agreements, resolved disagreements, and unresolved decisions. Unresolved opposite opinions MUST be recorded in `## Pending User Decisions` for explicit user decision. +17. **Debate Traceability**: The plan MUST include Codex-first findings, Claude/Codex agreements, resolved disagreements, and unresolved decisions. Unresolved opposite opinions MUST be recorded in `## Pending User Decisions` for explicit user decision. -13. **Convergence Requirement**: The plan MUST record Claude/Codex agreements, resolved disagreements, and final convergence status in `## Claude-Codex Deliberation`. Stop only when convergence conditions are met or max rounds reached with explicit carry-over decisions. +18. **Convergence Requirement**: The plan MUST record Claude/Codex agreements, resolved disagreements, and final convergence status in `## Claude-Codex Deliberation`. Stop only when convergence conditions are met or max rounds reached with explicit carry-over decisions. -14. **Task Tag Requirement**: The plan MUST include `## Task Breakdown`, and every task MUST be tagged as either `coding` or `analyze` (no untagged tasks, no other tag values). +19. **Task Tag Requirement**: The plan MUST include `## Task Breakdown`, and every task MUST be tagged as either `coding` or `analyze` (no untagged tasks, no other tag values). --- @@ -549,6 +590,10 @@ After updating, **read the complete plan file** and verify: - The structured plan aligns with the original draft content - Claude/Codex disagreement handling is explicit and correctly reflected - No contradictions exist between different parts of the document +- No `AC-*` contains deferred, future, out-of-scope, post-work, or successor-loop semantics except as a valid Handoff AC whose current-loop verification is complete without performing future work +- Every `AC-*` is covered by at least one Task Breakdown row, and every Task Breakdown row targets at least one current-scope `AC-*` +- Every decision that defers work links to a `FUT-*` entry, and every such `FUT-*` entry links back with `Source DEC: DEC-N` +- Items under `## Future Work / Out of Scope` use `FUT-*`, not `AC-*`, and are not listed as current-scope Task Breakdown work If inconsistencies are found, fix them using the Edit tool. diff --git a/prompt-template/codex/full-alignment-review.md b/prompt-template/codex/full-alignment-review.md index 4367810e..0ec14bd1 100644 --- a/prompt-template/codex/full-alignment-review.md +++ b/prompt-template/codex/full-alignment-review.md @@ -8,6 +8,9 @@ This is a **mandatory checkpoint** (at configurable intervals). You must conduct @{{PLAN_FILE}} You MUST read this plan file first to understand the full scope of work before conducting your review. +Only items under `## Acceptance Criteria` and current-scope Task Breakdown rows are completion gates. +Items under `## Future Work` / `## Out of Scope`, including `FUT-*` items, are informational and MUST NOT block the COMPLETE verdict. +If a current-scope AC or current-scope task is deferred, treat it as incomplete. --- ## Claude's Work Summary @@ -103,12 +106,12 @@ The project's `.humanize/rlcr/{{LOOP_TIMESTAMP}}/` directory contains the histor ## Part 6: Output Requirements -- If issues found OR any AC is NOT MET (including deferred ACs), write your findings to @{{REVIEW_RESULT_FILE}} +- If issues found OR any current-scope AC is NOT MET (including deferred current-scope ACs), write your findings to @{{REVIEW_RESULT_FILE}} - Include specific action items for Claude to address, classified into: - Mainline Gaps - Blocking Side Issues - Queued Side Issues - **If development is stagnating** (see Part 4), write "STOP" as the last line -- **CRITICAL**: Only write "COMPLETE" as the last line if ALL ACs from the original plan are FULLY MET with no deferrals - - DEFERRED items are considered INCOMPLETE - do NOT output COMPLETE if any AC is deferred - - The ONLY condition for COMPLETE is: all original plan tasks are done, all ACs are met, no deferrals allowed +- **CRITICAL**: Only write "COMPLETE" as the last line if ALL current-scope ACs from the original plan are FULLY MET with no deferrals + - DEFERRED current-scope items are considered INCOMPLETE - do NOT output COMPLETE if any current-scope AC is deferred + - The ONLY condition for COMPLETE is: all current-scope original plan tasks are done, all current-scope ACs are met, no current-scope deferrals allowed diff --git a/prompt-template/codex/regular-review.md b/prompt-template/codex/regular-review.md index 4d4a8680..d7ae7510 100644 --- a/prompt-template/codex/regular-review.md +++ b/prompt-template/codex/regular-review.md @@ -7,6 +7,9 @@ You MUST read this plan file first to understand the full scope of work before conducting your review. This plan contains the complete requirements and implementation details that Claude should be following. +Only items under `## Acceptance Criteria` and current-scope Task Breakdown rows are completion gates. +Items under `## Future Work` / `## Out of Scope`, including `FUT-*` items, are informational and MUST NOT block the COMPLETE verdict. +If a current-scope AC or current-scope task is deferred, treat it as incomplete. Based on the original plan and @{{PROMPT_FILE}}, Claude claims to have completed the work. Please conduct a thorough critical review to verify this. @@ -23,7 +26,7 @@ Below is Claude's summary of the work completed: - Your task is to conduct a deep critical review, focusing on finding implementation issues and identifying gaps between "plan-design" and actual implementation. - Relevant top-level guidance documents, phased implementation plans, and other important documentation and implementation references are located under @{{DOCS_PATH}}. -- If Claude planned to defer any tasks to future phases in its summary, DO NOT follow its lead. Instead, you should force Claude to complete ALL tasks as planned. +- If Claude planned to defer any current-scope tasks to future phases in its summary, DO NOT follow its lead. Instead, you should force Claude to complete ALL current-scope tasks as planned. - Such deferred tasks are considered incomplete work and should be flagged in your review comments, requiring Claude to address them. - If Claude planned to defer any tasks, please explore the codebase in-depth and draft a detailed implementation plan. This plan should be included in your review comments for Claude to follow. - Your review should be meticulous and skeptical. Look for any discrepancies, missing features, incomplete implementations. @@ -69,8 +72,8 @@ If Claude mostly worked on queued side issues and failed to advance the mainline - In short, your review comments can include: problems/findings/blockers; claims that don't match reality; implementation plans for deferred work (to be implemented now); implementation plans for unfinished work; goal alignment issues. - Your output should be structured so Claude can tell which items are mainline gaps, blocking side issues, and queued side issues. - If after your investigation the actual situation does not match what Claude claims to have completed, or there is pending work to be done, output your review comments to @{{REVIEW_RESULT_FILE}}. -- **CRITICAL**: Only output "COMPLETE" as the last line if ALL tasks from the original plan are FULLY completed with no deferrals +- **CRITICAL**: Only output "COMPLETE" as the last line if ALL current-scope tasks from the original plan are FULLY completed with no deferrals - DEFERRED items are considered INCOMPLETE - do NOT output COMPLETE if any task is deferred - UNFINISHED items are considered INCOMPLETE - do NOT output COMPLETE if any task is pending - - The ONLY condition for COMPLETE is: all original plan tasks are done, all ACs are met, no deferrals or pending work allowed + - The ONLY condition for COMPLETE is: all current-scope original plan tasks are done, all current-scope ACs are met, no current-scope deferrals or pending work allowed - The word COMPLETE on the last line will stop Claude. diff --git a/prompt-template/plan/gen-plan-template.md b/prompt-template/plan/gen-plan-template.md index ebdd2d98..08958772 100644 --- a/prompt-template/plan/gen-plan-template.md +++ b/prompt-template/plan/gen-plan-template.md @@ -6,6 +6,7 @@ ## Acceptance Criteria Following TDD philosophy, each criterion includes positive and negative tests for deterministic verification. +`AC-*` items are current RLCR completion gates: they must describe work that this implementation loop must complete and verify. Do not encode deferred, future, out-of-scope, post-work, or successor-loop goals as `AC-*`. - AC-1: - Positive Tests (expected to PASS): @@ -22,6 +23,21 @@ Following TDD philosophy, each criterion includes positive and negative tests fo - Negative Tests: <...> ... +### Handoff AC Pattern + +Use this pattern only when the draft contains a legitimate future goal that must be preserved without making it part of the current RLCR completion gate. + +- AC-X: Handoff for is complete without performing the future work. + - Future Work Reference: FUT-Y + - Positive Tests (expected to PASS): + - + - + - + - Negative Tests (expected to FAIL): + - + - + - + ## Path Boundaries Path boundaries define the acceptable range of implementation quality and choices. @@ -72,11 +88,22 @@ Each task must include exactly one routing tag: - `coding`: implemented by Claude - `analyze`: executed via Codex (`/humanize:ask-codex`) +Every `AC-*` must be covered by at least one task. Every task must target at least one `AC-*`. Do not target `FUT-*`, `DEC-*`, or `-` in the Target AC column. + | Task ID | Description | Target AC | Tag (`coding`/`analyze`) | Depends On | |---------|-------------|-----------|----------------------------|------------| | task1 | <...> | AC-1 | coding | - | | task2 | <...> | AC-2 | analyze | task1 | +## Future Work / Out of Scope + +Future, deferred, post-work, successor-loop, and out-of-scope items belong here, not under `## Acceptance Criteria`. + +- FUT-1: + - Source DEC: DEC-1 + - Current-loop handoff: AC-X + - Promotion trigger: + ## Claude-Codex Deliberation ### Agreements diff --git a/tests/test-gen-plan.sh b/tests/test-gen-plan.sh index b5bcab07..c287a011 100755 --- a/tests/test-gen-plan.sh +++ b/tests/test-gen-plan.sh @@ -121,6 +121,8 @@ fi echo "" echo "PT-5b: Claude/Codex deliberation workflow validation" PLAN_TEMPLATE="$PROJECT_ROOT/prompt-template/plan/gen-plan-template.md" +REGULAR_REVIEW_TEMPLATE="$PROJECT_ROOT/prompt-template/codex/regular-review.md" +FULL_ALIGNMENT_TEMPLATE="$PROJECT_ROOT/prompt-template/codex/full-alignment-review.md" if [[ -f "$GEN_PLAN_CMD" ]] && grep -q "scripts/ask-codex.sh" "$GEN_PLAN_CMD"; then pass "gen-plan command allows ask-codex script" @@ -234,6 +236,69 @@ else fail "plan template includes coding/analyze task tag column" "tag column in task table" "missing" fi +if [[ -f "$PLAN_TEMPLATE" ]] && grep -q "### Handoff AC Pattern" "$PLAN_TEMPLATE"; then + pass "plan template includes handoff AC pattern" +else + fail "plan template includes handoff AC pattern" "Handoff AC Pattern section" "missing" +fi + +if [[ -f "$PLAN_TEMPLATE" ]] && grep -q "## Future Work / Out of Scope" "$PLAN_TEMPLATE" && grep -q "FUT-1" "$PLAN_TEMPLATE"; then + pass "plan template includes FUT future-work section" +else + fail "plan template includes FUT future-work section" "Future Work / Out of Scope with FUT-* example" "missing" +fi + +if [[ -f "$PLAN_TEMPLATE" ]] && grep -q "current RLCR completion gates" "$PLAN_TEMPLATE"; then + pass "plan template defines ACs as current RLCR completion gates" +else + fail "plan template defines ACs as current RLCR completion gates" "current RLCR completion gates contract" "missing" +fi + +if [[ -f "$GEN_PLAN_CMD" ]] \ + && grep -q "Deferred AC Keyword Guard" "$GEN_PLAN_CMD" \ + && grep -q "next phase" "$GEN_PLAN_CMD" \ + && grep -q "to be implemented in" "$GEN_PLAN_CMD"; then + pass "gen-plan generation rules include deferred AC keyword guard" +else + fail "gen-plan generation rules include deferred AC keyword guard" "Deferred AC Keyword Guard with keyword list" "missing" +fi + +if [[ -f "$GEN_PLAN_CMD" ]] && grep -q "AC/Task Bidirectional Coverage" "$GEN_PLAN_CMD"; then + pass "gen-plan generation rules include AC/task bidirectional coverage" +else + fail "gen-plan generation rules include AC/task bidirectional coverage" "AC/Task Bidirectional Coverage rule" "missing" +fi + +if [[ -f "$GEN_PLAN_CMD" ]] && grep -q "DEC/FUT Linkage" "$GEN_PLAN_CMD"; then + pass "gen-plan generation rules include DEC/FUT linkage" +else + fail "gen-plan generation rules include DEC/FUT linkage" "DEC/FUT Linkage rule" "missing" +fi + +if [[ -f "$GEN_PLAN_CMD" ]] \ + && grep -q "REQUIRED_CHANGES" "$GEN_PLAN_CMD" \ + && grep -q "real work happens outside this RLCR loop" "$GEN_PLAN_CMD"; then + pass "gen-plan codex review requires semantic deferred-AC detection" +else + fail "gen-plan codex review requires semantic deferred-AC detection" "REQUIRED_CHANGES for ACs whose real work happens outside the loop" "missing" +fi + +if [[ -f "$REGULAR_REVIEW_TEMPLATE" ]] \ + && grep -q "FUT-\\*" "$REGULAR_REVIEW_TEMPLATE" \ + && grep -q "MUST NOT block the COMPLETE verdict" "$REGULAR_REVIEW_TEMPLATE"; then + pass "regular RLCR review template excludes FUT items from COMPLETE gate" +else + fail "regular RLCR review template excludes FUT items from COMPLETE gate" "FUT-* items MUST NOT block COMPLETE" "missing" +fi + +if [[ -f "$FULL_ALIGNMENT_TEMPLATE" ]] \ + && grep -q "FUT-\\*" "$FULL_ALIGNMENT_TEMPLATE" \ + && grep -q "MUST NOT block the COMPLETE verdict" "$FULL_ALIGNMENT_TEMPLATE"; then + pass "full-alignment RLCR review template excludes FUT items from COMPLETE gate" +else + fail "full-alignment RLCR review template excludes FUT items from COMPLETE gate" "FUT-* items MUST NOT block COMPLETE" "missing" +fi + if [[ -f "$GEN_PLAN_CMD" ]] && grep -q "### Step 1.5: Consolidate Pending User Decisions" "$GEN_PLAN_CMD"; then pass "gen-plan command includes consolidate pending user decisions step" else From 27adb00aff0afbfb2a37968922486e193f0cf2b4 Mon Sep 17 00:00:00 2001 From: Buyi-wsgzg Date: Wed, 20 May 2026 10:02:51 +0800 Subject: [PATCH 2/5] Keep handoff pattern out of plan AC template --- commands/gen-plan.md | 20 +++----------------- prompt-template/plan/gen-plan-template.md | 17 +---------------- tests/test-gen-plan.sh | 20 +++++++++++++++++--- 3 files changed, 21 insertions(+), 36 deletions(-) diff --git a/commands/gen-plan.md b/commands/gen-plan.md index 43280665..7fed9a65 100644 --- a/commands/gen-plan.md +++ b/commands/gen-plan.md @@ -388,7 +388,7 @@ Deeply think and generate the plan.md following these rules: ## Acceptance Criteria Following TDD philosophy, each criterion includes positive and negative tests for deterministic verification. -`AC-*` items are current RLCR completion gates: they must describe work that this implementation loop must complete and verify. Do not encode deferred, future, out-of-scope, post-work, or successor-loop goals as `AC-*`. +The `AC-*` items are current RLCR completion gates for this implementation loop. - AC-1: - Positive Tests (expected to PASS): @@ -405,21 +405,6 @@ Following TDD philosophy, each criterion includes positive and negative tests fo - Negative Tests: <...> ... -### Handoff AC Pattern - -Use this pattern only when the draft contains a legitimate future goal that must be preserved without making it part of the current RLCR completion gate. - -- AC-X: Handoff for is complete without performing the future work. - - Future Work Reference: FUT-Y - - Positive Tests (expected to PASS): - - - - - - - - Negative Tests (expected to FAIL): - - - - - - - ## Path Boundaries Path boundaries define the acceptable range of implementation quality and choices. @@ -545,7 +530,7 @@ When `alternative_plan_language` is empty, absent, set to `"English"`, or set to 7. **Deferred AC Keyword Guard**: Before finalizing, scan each AC body for deferral markers: `TODO`, `TBD`, `deferred`, `future`, `follow-up`, `subsequent`, `next phase`, `next iteration`, `next milestone`, `next loop`, `v2`, `v.next`, `Phase II`, `left for`, `to be implemented in`, and `see FUT-`. If any marker means the AC's real work is outside this loop, rewrite the item as a current-loop handoff AC plus a `FUT-*` item, or move it entirely to future work. -8. **Handoff AC Pattern**: When preserving a future goal, write a current-loop AC only for the handoff state/artifact/documentation. The handoff AC may reference `FUT-*`, but its positive and negative tests must be fully verifiable in this loop and must not require completing the future work. +8. **Handoff AC Pattern**: When preserving a future goal, write a current-loop AC only for the handoff state/artifact/documentation. The handoff AC may reference `FUT-*`, but its positive and negative tests must be fully verifiable in this loop and must not require completing the future work. This pattern is generation guidance only; do not leave a `Handoff AC Pattern` template/example section in the final plan. 9. **AC/Task Bidirectional Coverage**: Every `AC-*` must be covered by at least one Task Breakdown row. Every Task Breakdown row must target at least one current-scope `AC-*`. No row may use an empty target, `-`, `FUT-*`, or `DEC-*` as its Target AC. @@ -590,6 +575,7 @@ After updating, **read the complete plan file** and verify: - The structured plan aligns with the original draft content - Claude/Codex disagreement handling is explicit and correctly reflected - No contradictions exist between different parts of the document +- No instructional `Handoff AC Pattern` template/example section remains in the final plan - No `AC-*` contains deferred, future, out-of-scope, post-work, or successor-loop semantics except as a valid Handoff AC whose current-loop verification is complete without performing future work - Every `AC-*` is covered by at least one Task Breakdown row, and every Task Breakdown row targets at least one current-scope `AC-*` - Every decision that defers work links to a `FUT-*` entry, and every such `FUT-*` entry links back with `Source DEC: DEC-N` diff --git a/prompt-template/plan/gen-plan-template.md b/prompt-template/plan/gen-plan-template.md index 08958772..e28ec25c 100644 --- a/prompt-template/plan/gen-plan-template.md +++ b/prompt-template/plan/gen-plan-template.md @@ -6,7 +6,7 @@ ## Acceptance Criteria Following TDD philosophy, each criterion includes positive and negative tests for deterministic verification. -`AC-*` items are current RLCR completion gates: they must describe work that this implementation loop must complete and verify. Do not encode deferred, future, out-of-scope, post-work, or successor-loop goals as `AC-*`. +The `AC-*` items are current RLCR completion gates for this implementation loop. - AC-1: - Positive Tests (expected to PASS): @@ -23,21 +23,6 @@ Following TDD philosophy, each criterion includes positive and negative tests fo - Negative Tests: <...> ... -### Handoff AC Pattern - -Use this pattern only when the draft contains a legitimate future goal that must be preserved without making it part of the current RLCR completion gate. - -- AC-X: Handoff for is complete without performing the future work. - - Future Work Reference: FUT-Y - - Positive Tests (expected to PASS): - - - - - - - - Negative Tests (expected to FAIL): - - - - - - - ## Path Boundaries Path boundaries define the acceptable range of implementation quality and choices. diff --git a/tests/test-gen-plan.sh b/tests/test-gen-plan.sh index c287a011..79e80ee3 100755 --- a/tests/test-gen-plan.sh +++ b/tests/test-gen-plan.sh @@ -236,10 +236,24 @@ else fail "plan template includes coding/analyze task tag column" "tag column in task table" "missing" fi -if [[ -f "$PLAN_TEMPLATE" ]] && grep -q "### Handoff AC Pattern" "$PLAN_TEMPLATE"; then - pass "plan template includes handoff AC pattern" +if [[ -f "$PLAN_TEMPLATE" ]] && ! grep -q "Handoff AC Pattern" "$PLAN_TEMPLATE"; then + pass "plan template excludes handoff AC pattern from copied output" else - fail "plan template includes handoff AC pattern" "Handoff AC Pattern section" "missing" + fail "plan template excludes handoff AC pattern from copied output" "no Handoff AC Pattern template/example section" "section still present" +fi + +AC_SECTION=$(awk '/^## Acceptance Criteria[[:space:]]*$/{in_ac=1; next} /^## / && in_ac{in_ac=0} in_ac' "$PLAN_TEMPLATE") +if [[ -f "$PLAN_TEMPLATE" ]] \ + && ! grep -Eq "deferred|future|follow-up|subsequent|next phase|next iteration|next milestone|next loop|v2|v\\.next|Phase II|left for|to be implemented in|FUT-" <<< "$AC_SECTION"; then + pass "plan template acceptance criteria section omits deferred/future markers" +else + fail "plan template acceptance criteria section omits deferred/future markers" "no deferred/future markers under Acceptance Criteria" "markers present" +fi + +if [[ -f "$GEN_PLAN_CMD" ]] && grep -q "Handoff AC Pattern" "$GEN_PLAN_CMD" && grep -q "generation guidance only" "$GEN_PLAN_CMD"; then + pass "gen-plan command keeps handoff pattern as generation guidance" +else + fail "gen-plan command keeps handoff pattern as generation guidance" "Handoff AC Pattern generation guidance" "missing" fi if [[ -f "$PLAN_TEMPLATE" ]] && grep -q "## Future Work / Out of Scope" "$PLAN_TEMPLATE" && grep -q "FUT-1" "$PLAN_TEMPLATE"; then From 38391e4c2cf47a73eda272ac148189670247f7ca Mon Sep 17 00:00:00 2001 From: Buyi-wsgzg Date: Wed, 20 May 2026 11:03:43 +0800 Subject: [PATCH 3/5] Scope deferral escalation to current tasks --- prompt-template/codex/regular-review.md | 13 +++++++------ tests/test-gen-plan.sh | 13 +++++++++++++ 2 files changed, 20 insertions(+), 6 deletions(-) diff --git a/prompt-template/codex/regular-review.md b/prompt-template/codex/regular-review.md index d7ae7510..0fb4459e 100644 --- a/prompt-template/codex/regular-review.md +++ b/prompt-template/codex/regular-review.md @@ -28,9 +28,10 @@ Below is Claude's summary of the work completed: - Relevant top-level guidance documents, phased implementation plans, and other important documentation and implementation references are located under @{{DOCS_PATH}}. - If Claude planned to defer any current-scope tasks to future phases in its summary, DO NOT follow its lead. Instead, you should force Claude to complete ALL current-scope tasks as planned. - Such deferred tasks are considered incomplete work and should be flagged in your review comments, requiring Claude to address them. - - If Claude planned to defer any tasks, please explore the codebase in-depth and draft a detailed implementation plan. This plan should be included in your review comments for Claude to follow. + - If Claude planned to defer any current-scope tasks, please explore the codebase in-depth and draft a detailed implementation plan. This plan should be included in your review comments for Claude to follow. + - Do NOT draft implementation plans solely for `FUT-*`, `## Future Work`, or `## Out of Scope` deferrals unless they block a current-scope AC or current-scope task. - Your review should be meticulous and skeptical. Look for any discrepancies, missing features, incomplete implementations. -- If Claude does not plan to defer any tasks, but honestly admits that some tasks are still pending (not yet completed), you should also include those pending tasks in your review. +- If Claude does not plan to defer any current-scope tasks, but honestly admits that some current-scope tasks are still pending (not yet completed), you should also include those pending tasks in your review. - Your review should elaborate on those unfinished tasks, explore the codebase, and draft an implementation plan. - A good engineering implementation plan should be **singular, directive, and definitive**, rather than discussing multiple possible implementation options. - The implementation plan should be **unambiguous**, internally consistent, and coherent from beginning to end, so that **Claude can execute the work accurately and without error**. @@ -69,11 +70,11 @@ If Claude mostly worked on queued side issues and failed to advance the mainline ## Part 5: Output Requirements -- In short, your review comments can include: problems/findings/blockers; claims that don't match reality; implementation plans for deferred work (to be implemented now); implementation plans for unfinished work; goal alignment issues. +- In short, your review comments can include: problems/findings/blockers; claims that don't match reality; implementation plans for deferred current-scope work (to be implemented now); implementation plans for unfinished current-scope work; goal alignment issues. - Your output should be structured so Claude can tell which items are mainline gaps, blocking side issues, and queued side issues. -- If after your investigation the actual situation does not match what Claude claims to have completed, or there is pending work to be done, output your review comments to @{{REVIEW_RESULT_FILE}}. +- If after your investigation the actual situation does not match what Claude claims to have completed, or there is pending current-scope work to be done, output your review comments to @{{REVIEW_RESULT_FILE}}. - **CRITICAL**: Only output "COMPLETE" as the last line if ALL current-scope tasks from the original plan are FULLY completed with no deferrals - - DEFERRED items are considered INCOMPLETE - do NOT output COMPLETE if any task is deferred - - UNFINISHED items are considered INCOMPLETE - do NOT output COMPLETE if any task is pending + - DEFERRED current-scope items are considered INCOMPLETE - do NOT output COMPLETE if any current-scope task is deferred + - UNFINISHED current-scope items are considered INCOMPLETE - do NOT output COMPLETE if any current-scope task is pending - The ONLY condition for COMPLETE is: all current-scope original plan tasks are done, all current-scope ACs are met, no current-scope deferrals or pending work allowed - The word COMPLETE on the last line will stop Claude. diff --git a/tests/test-gen-plan.sh b/tests/test-gen-plan.sh index 79e80ee3..0ae79810 100755 --- a/tests/test-gen-plan.sh +++ b/tests/test-gen-plan.sh @@ -305,6 +305,19 @@ else fail "regular RLCR review template excludes FUT items from COMPLETE gate" "FUT-* items MUST NOT block COMPLETE" "missing" fi +if [[ -f "$REGULAR_REVIEW_TEMPLATE" ]] \ + && grep -q "defer any current-scope tasks" "$REGULAR_REVIEW_TEMPLATE" \ + && grep -q 'Do NOT draft implementation plans solely for `FUT-\*`' "$REGULAR_REVIEW_TEMPLATE" \ + && grep -q "unfinished current-scope work" "$REGULAR_REVIEW_TEMPLATE" \ + && grep -q "pending current-scope work" "$REGULAR_REVIEW_TEMPLATE" \ + && ! grep -q "defer any tasks" "$REGULAR_REVIEW_TEMPLATE" \ + && ! grep -q "if any task is deferred" "$REGULAR_REVIEW_TEMPLATE" \ + && ! grep -q "if any task is pending" "$REGULAR_REVIEW_TEMPLATE"; then + pass "regular RLCR review template scopes deferral escalation to current-scope tasks" +else + fail "regular RLCR review template scopes deferral escalation to current-scope tasks" "current-scope-only deferral escalation" "unscoped task deferral language present" +fi + if [[ -f "$FULL_ALIGNMENT_TEMPLATE" ]] \ && grep -q "FUT-\\*" "$FULL_ALIGNMENT_TEMPLATE" \ && grep -q "MUST NOT block the COMPLETE verdict" "$FULL_ALIGNMENT_TEMPLATE"; then From 53349f16a951da03f8920f4bed26fce7cd6e7917 Mon Sep 17 00:00:00 2001 From: Buyi-wsgzg Date: Wed, 20 May 2026 11:25:02 +0800 Subject: [PATCH 4/5] Close deferred convergence escape hatch --- commands/gen-plan.md | 18 ++++++++++++------ tests/test-gen-plan.sh | 37 ++++++++++++++++++++++++++++++++----- 2 files changed, 44 insertions(+), 11 deletions(-) diff --git a/commands/gen-plan.md b/commands/gen-plan.md index 7fed9a65..795b1e4c 100644 --- a/commands/gen-plan.md +++ b/commands/gen-plan.md @@ -268,6 +268,7 @@ After Claude candidate plan v1 is ready, run iterative challenge/refine rounds w - Prompt MUST include current candidate plan, prior disagreements, and unresolved items - Prompt MUST include the RLCR plan contract: `AC-*` items are current RLCR completion gates; deferred, future, out-of-scope, post-work, or successor-loop goals must be represented as `FUT-*` under `## Future Work / Out of Scope`, optionally with a current-loop handoff AC. - Prompt MUST require Codex to inspect each AC for deferral semantics. If any AC claims the real work happens outside this RLCR loop, Codex MUST put it under `REQUIRED_CHANGES`, not `OPTIONAL_IMPROVEMENTS`. + - Prompt MUST include the Handoff AC Pattern definition inline: when preserving a future goal, a current-loop AC may cover only the handoff state/artifact/documentation. The handoff AC may reference `FUT-*`, but its positive and negative tests must be fully verifiable in this loop without completing the future work. The final plan must not leave a `Handoff AC Pattern` template/example section behind. - Prompt MUST require a hard keyword scan within each AC body. Treat these strings as blocking unless the AC follows the Handoff AC Pattern and the current-loop verification is complete without performing future work: `TODO`, `TBD`, `deferred`, `future`, `follow-up`, `subsequent`, `next phase`, `next iteration`, `next milestone`, `next loop`, `v2`, `v.next`, `Phase II`, `left for`, `to be implemented in`, `see FUT-`. - Prompt MUST require AC/Task bidirectional coverage: every `AC-*` is targeted by at least one Task Breakdown row; every Task Breakdown row targets at least one current-scope `AC-*`; no task target may be empty, `-`, `FUT-*`, or `DEC-*`. - Require output format: @@ -284,8 +285,9 @@ After Claude candidate plan v1 is ready, run iterative challenge/refine rounds w - Topic - Claude position - Second Codex position - - Resolution status (`resolved`, `needs_user_decision`, `deferred`) + - Resolution status (`resolved` or `needs_user_decision`) - Round-to-round delta + - Do NOT use `deferred` as a convergence status. If the selected resolution defers work, it is `resolved` only after the candidate plan records a `DEC-*` decision with a non-`PENDING` `Decision Status`, links that decision to a `FUT-*` item under `## Future Work / Out of Scope`, and ensures the deferred work is not represented as a current-scope AC/task. If that DEC/FUT linkage is missing or the decision needs human input, mark the topic `needs_user_decision`. ### Loop Termination Rules @@ -297,8 +299,9 @@ Repeat convergence rounds until one of the following is true: If max rounds are reached with unresolved opposite opinions, carry them to user decision phase explicitly. Set convergence state explicitly: -- `PLAN_CONVERGENCE_STATUS=converged` when convergence conditions are met +- `PLAN_CONVERGENCE_STATUS=converged` when convergence conditions are met, no `needs_user_decision` topic remains, and every resolution that defers work already has a resolved `DEC-*` plus linked `FUT-*` entry - `PLAN_CONVERGENCE_STATUS=partially_converged` otherwise +- Any unlinked deferred-work resolution MUST force `PLAN_CONVERGENCE_STATUS=partially_converged` and `HUMAN_REVIEW_REQUIRED=true` --- @@ -313,7 +316,7 @@ Decide if manual review can be skipped: - Else if `AUTO_START_RLCR_IF_CONVERGED=true` **and** `PLAN_CONVERGENCE_STATUS=converged`, set `HUMAN_REVIEW_REQUIRED=false` - Otherwise set `HUMAN_REVIEW_REQUIRED=true` -If `HUMAN_REVIEW_REQUIRED=false`, skip Step 2-4 and continue directly to Phase 7. +Do not skip Step 1.5. If `HUMAN_REVIEW_REQUIRED=false`, run Step 1.5 first, then skip Step 2-4 and continue directly to Phase 7. ### Step 1.5: Consolidate Pending User Decisions (runs unconditionally) @@ -321,11 +324,13 @@ Before proceeding (regardless of `HUMAN_REVIEW_REQUIRED`), consolidate all user- 1. Extract `QUESTIONS_FOR_USER` items from Codex Analysis v1 (Phase 3) 2. Extract items with status `needs_user_decision` from the final convergence matrix (Phase 5) — use the last round's state, not intermediate rounds -3. Deduplicate: if the same topic appears in both sources, merge into one entry -4. For each collected item, check if it was substantively resolved during Phase 4-5 plan refinement (i.e., Claude addressed it and second Codex agreed in a subsequent round). Remove only items with clear evidence of resolution. -5. Write all remaining unresolved items into the plan's `## Pending User Decisions` section. Use `DEC-N` identifiers. Set `Decision Status` to `PENDING`. +3. Extract any convergence topic whose selected resolution defers work but lacks a resolved `DEC-*` plus linked `FUT-*` entry. Add it as a `PENDING` decision so it blocks auto-start instead of silently escaping the completion gate. +4. Deduplicate: if the same topic appears in multiple sources, merge into one entry +5. For each collected item, check if it was substantively resolved during Phase 4-5 plan refinement (i.e., Claude addressed it and second Codex agreed in a subsequent round). Remove only items with clear evidence of resolution and, for deferred-work resolutions, complete resolved `DEC-*`/`FUT-*` linkage. +6. Write all remaining unresolved items into the plan's `## Pending User Decisions` section. Use `DEC-N` identifiers. Set `Decision Status` to `PENDING`. - For Claude-vs-Codex disagreements: fill `Claude Position`, `Codex Position`, and `Tradeoff Summary` - For open questions (no opposing positions): set `Claude Position` to Claude's tentative answer (if any), `Codex Position` to `N/A - open question`, and `Tradeoff Summary` to the question's context + - For deferred-work resolutions that are already decided, do not leave them as `PENDING`; instead record the resolved decision, reference its `FUT-*` entry, and ensure the `FUT-*` entry includes `Source DEC: DEC-N` This ensures: - When `HUMAN_REVIEW_REQUIRED=true`: items are visible for Steps 2-4 user resolution @@ -629,6 +634,7 @@ If all of the following are true: - `PLAN_CONVERGENCE_STATUS=converged` - `GEN_PLAN_MODE=discussion` - There are no pending decisions with status `PENDING` +- Every convergence topic whose resolution defers work has a resolved `DEC-*` plus linked `FUT-*` entry; no deferred-work resolution exists only in the convergence matrix Then start work immediately by running: diff --git a/tests/test-gen-plan.sh b/tests/test-gen-plan.sh index 0ae79810..40dd5ae9 100755 --- a/tests/test-gen-plan.sh +++ b/tests/test-gen-plan.sh @@ -242,12 +242,18 @@ else fail "plan template excludes handoff AC pattern from copied output" "no Handoff AC Pattern template/example section" "section still present" fi -AC_SECTION=$(awk '/^## Acceptance Criteria[[:space:]]*$/{in_ac=1; next} /^## / && in_ac{in_ac=0} in_ac' "$PLAN_TEMPLATE") -if [[ -f "$PLAN_TEMPLATE" ]] \ - && ! grep -Eq "deferred|future|follow-up|subsequent|next phase|next iteration|next milestone|next loop|v2|v\\.next|Phase II|left for|to be implemented in|FUT-" <<< "$AC_SECTION"; then - pass "plan template acceptance criteria section omits deferred/future markers" +if [[ -r "$PLAN_TEMPLATE" ]]; then + if AC_SECTION=$(awk '/^## Acceptance Criteria[[:space:]]*$/{in_ac=1; next} /^## / && in_ac{in_ac=0} in_ac' "$PLAN_TEMPLATE"); then + if ! grep -Eq "deferred|future|follow-up|subsequent|next phase|next iteration|next milestone|next loop|v2|v\\.next|Phase II|left for|to be implemented in|FUT-" <<< "$AC_SECTION"; then + pass "plan template acceptance criteria section omits deferred/future markers" + else + fail "plan template acceptance criteria section omits deferred/future markers" "no deferred/future markers under Acceptance Criteria" "markers present" + fi + else + fail "plan template acceptance criteria section omits deferred/future markers" "Acceptance Criteria section can be extracted" "awk extraction failed" + fi else - fail "plan template acceptance criteria section omits deferred/future markers" "no deferred/future markers under Acceptance Criteria" "markers present" + fail "plan template acceptance criteria section omits deferred/future markers" "readable plan template" "missing or unreadable: $PLAN_TEMPLATE" fi if [[ -f "$GEN_PLAN_CMD" ]] && grep -q "Handoff AC Pattern" "$GEN_PLAN_CMD" && grep -q "generation guidance only" "$GEN_PLAN_CMD"; then @@ -256,6 +262,15 @@ else fail "gen-plan command keeps handoff pattern as generation guidance" "Handoff AC Pattern generation guidance" "missing" fi +if [[ -f "$GEN_PLAN_CMD" ]] \ + && grep -qF "Prompt MUST include the Handoff AC Pattern definition inline" "$GEN_PLAN_CMD" \ + && grep -qF "current-loop AC may cover only the handoff" "$GEN_PLAN_CMD" \ + && grep -qF "without completing the future work" "$GEN_PLAN_CMD"; then + pass "gen-plan second Codex prompt defines handoff AC pattern inline" +else + fail "gen-plan second Codex prompt defines handoff AC pattern inline" "inline Handoff AC Pattern definition in Phase 5 prompt requirements" "missing" +fi + if [[ -f "$PLAN_TEMPLATE" ]] && grep -q "## Future Work / Out of Scope" "$PLAN_TEMPLATE" && grep -q "FUT-1" "$PLAN_TEMPLATE"; then pass "plan template includes FUT future-work section" else @@ -289,6 +304,18 @@ else fail "gen-plan generation rules include DEC/FUT linkage" "DEC/FUT Linkage rule" "missing" fi +if [[ -f "$GEN_PLAN_CMD" ]] \ + && grep -qF 'Resolution status (`resolved` or `needs_user_decision`)' "$GEN_PLAN_CMD" \ + && grep -qF 'Do NOT use `deferred` as a convergence status' "$GEN_PLAN_CMD" \ + && grep -qF 'resolved `DEC-*` plus linked `FUT-*`' "$GEN_PLAN_CMD" \ + && grep -qF 'unlinked deferred-work resolution MUST force `PLAN_CONVERGENCE_STATUS=partially_converged`' "$GEN_PLAN_CMD" \ + && grep -qF 'no deferred-work resolution exists only in the convergence matrix' "$GEN_PLAN_CMD" \ + && ! grep -qF 'Resolution status (`resolved`, `needs_user_decision`, `deferred`)' "$GEN_PLAN_CMD"; then + pass "gen-plan convergence matrix prevents deferred status escape hatch" +else + fail "gen-plan convergence matrix prevents deferred status escape hatch" "no deferred status and DEC/FUT linkage required before convergence/auto-start" "missing or stale convergence status rule" +fi + if [[ -f "$GEN_PLAN_CMD" ]] \ && grep -q "REQUIRED_CHANGES" "$GEN_PLAN_CMD" \ && grep -q "real work happens outside this RLCR loop" "$GEN_PLAN_CMD"; then From 4419813e045b2d19c4a9792c97684bda86dd00a8 Mon Sep 17 00:00:00 2001 From: Buyi-wsgzg Date: Wed, 20 May 2026 18:03:33 +0800 Subject: [PATCH 5/5] Make deferred AC guard semantic --- commands/gen-plan.md | 6 +++--- tests/test-gen-plan.sh | 21 ++++++++++++++------- 2 files changed, 17 insertions(+), 10 deletions(-) diff --git a/commands/gen-plan.md b/commands/gen-plan.md index 795b1e4c..11c3b84b 100644 --- a/commands/gen-plan.md +++ b/commands/gen-plan.md @@ -269,7 +269,7 @@ After Claude candidate plan v1 is ready, run iterative challenge/refine rounds w - Prompt MUST include the RLCR plan contract: `AC-*` items are current RLCR completion gates; deferred, future, out-of-scope, post-work, or successor-loop goals must be represented as `FUT-*` under `## Future Work / Out of Scope`, optionally with a current-loop handoff AC. - Prompt MUST require Codex to inspect each AC for deferral semantics. If any AC claims the real work happens outside this RLCR loop, Codex MUST put it under `REQUIRED_CHANGES`, not `OPTIONAL_IMPROVEMENTS`. - Prompt MUST include the Handoff AC Pattern definition inline: when preserving a future goal, a current-loop AC may cover only the handoff state/artifact/documentation. The handoff AC may reference `FUT-*`, but its positive and negative tests must be fully verifiable in this loop without completing the future work. The final plan must not leave a `Handoff AC Pattern` template/example section behind. - - Prompt MUST require a hard keyword scan within each AC body. Treat these strings as blocking unless the AC follows the Handoff AC Pattern and the current-loop verification is complete without performing future work: `TODO`, `TBD`, `deferred`, `future`, `follow-up`, `subsequent`, `next phase`, `next iteration`, `next milestone`, `next loop`, `v2`, `v.next`, `Phase II`, `left for`, `to be implemented in`, `see FUT-`. + - Prompt MUST require semantic deferred-AC detection within each AC body. Use these strings as review hints, not automatic blockers: `TODO`, `TBD`, `deferred`, `future`, `follow-up`, `subsequent`, `next phase`, `next iteration`, `next milestone`, `next loop`, `v2`, `v.next`, `Phase II`, `left for`, `to be implemented in`, `see FUT-`. Codex MUST put the issue under `REQUIRED_CHANGES` only when the AC's meaning makes the real work happen outside this RLCR loop and the AC is not a valid Handoff AC. Do not block current-scope domain wording solely because it contains a marker term, such as an AC about validating future dates as input. - Prompt MUST require AC/Task bidirectional coverage: every `AC-*` is targeted by at least one Task Breakdown row; every Task Breakdown row targets at least one current-scope `AC-*`; no task target may be empty, `-`, `FUT-*`, or `DEC-*`. - Require output format: - `AGREE:` points accepted as reasonable @@ -533,7 +533,7 @@ When `alternative_plan_language` is empty, absent, set to `"English"`, or set to 6. **Current-Scope AC Contract**: `AC-*` items are the current RLCR completion gate. Do NOT create deferred ACs. Any deferred, future, out-of-scope, post-work, successor-task, or successor-loop goal must be written as `FUT-*` under `## Future Work / Out of Scope`, optionally linked to a current-loop Handoff AC. -7. **Deferred AC Keyword Guard**: Before finalizing, scan each AC body for deferral markers: `TODO`, `TBD`, `deferred`, `future`, `follow-up`, `subsequent`, `next phase`, `next iteration`, `next milestone`, `next loop`, `v2`, `v.next`, `Phase II`, `left for`, `to be implemented in`, and `see FUT-`. If any marker means the AC's real work is outside this loop, rewrite the item as a current-loop handoff AC plus a `FUT-*` item, or move it entirely to future work. +7. **Deferred AC Semantic Guard**: Before finalizing, inspect each AC body for deferral semantics. Deferral marker terms such as `TODO`, `TBD`, `deferred`, `future`, `follow-up`, `subsequent`, `next phase`, `next iteration`, `next milestone`, `next loop`, `v2`, `v.next`, `Phase II`, `left for`, `to be implemented in`, and `see FUT-` are review hints, not automatic failures. If the AC's meaning makes the real work happen outside this loop, rewrite the item as a current-loop handoff AC plus a `FUT-*` item, or move it entirely to future work. If a marker term is ordinary current-scope domain language, such as validating future dates as input, keep the AC if its tests are fully current-loop verifiable. 8. **Handoff AC Pattern**: When preserving a future goal, write a current-loop AC only for the handoff state/artifact/documentation. The handoff AC may reference `FUT-*`, but its positive and negative tests must be fully verifiable in this loop and must not require completing the future work. This pattern is generation guidance only; do not leave a `Handoff AC Pattern` template/example section in the final plan. @@ -581,7 +581,7 @@ After updating, **read the complete plan file** and verify: - Claude/Codex disagreement handling is explicit and correctly reflected - No contradictions exist between different parts of the document - No instructional `Handoff AC Pattern` template/example section remains in the final plan -- No `AC-*` contains deferred, future, out-of-scope, post-work, or successor-loop semantics except as a valid Handoff AC whose current-loop verification is complete without performing future work +- No `AC-*` uses deferred, future, out-of-scope, post-work, or successor-loop semantics except as a valid Handoff AC whose current-loop verification is complete without performing future work - Every `AC-*` is covered by at least one Task Breakdown row, and every Task Breakdown row targets at least one current-scope `AC-*` - Every decision that defers work links to a `FUT-*` entry, and every such `FUT-*` entry links back with `Source DEC: DEC-N` - Items under `## Future Work / Out of Scope` use `FUT-*`, not `AC-*`, and are not listed as current-scope Task Breakdown work diff --git a/tests/test-gen-plan.sh b/tests/test-gen-plan.sh index 40dd5ae9..973ccd04 100755 --- a/tests/test-gen-plan.sh +++ b/tests/test-gen-plan.sh @@ -244,7 +244,9 @@ fi if [[ -r "$PLAN_TEMPLATE" ]]; then if AC_SECTION=$(awk '/^## Acceptance Criteria[[:space:]]*$/{in_ac=1; next} /^## / && in_ac{in_ac=0} in_ac' "$PLAN_TEMPLATE"); then - if ! grep -Eq "deferred|future|follow-up|subsequent|next phase|next iteration|next milestone|next loop|v2|v\\.next|Phase II|left for|to be implemented in|FUT-" <<< "$AC_SECTION"; then + if ! grep -q '[^[:space:]]' <<< "$AC_SECTION"; then + fail "plan template acceptance criteria section omits deferred/future markers" "non-empty Acceptance Criteria section" "section missing or empty" + elif ! grep -Eq "deferred|future|follow-up|subsequent|next phase|next iteration|next milestone|next loop|v2|v\\.next|Phase II|left for|to be implemented in|FUT-" <<< "$AC_SECTION"; then pass "plan template acceptance criteria section omits deferred/future markers" else fail "plan template acceptance criteria section omits deferred/future markers" "no deferred/future markers under Acceptance Criteria" "markers present" @@ -284,12 +286,13 @@ else fi if [[ -f "$GEN_PLAN_CMD" ]] \ - && grep -q "Deferred AC Keyword Guard" "$GEN_PLAN_CMD" \ + && grep -q "Deferred AC Semantic Guard" "$GEN_PLAN_CMD" \ && grep -q "next phase" "$GEN_PLAN_CMD" \ - && grep -q "to be implemented in" "$GEN_PLAN_CMD"; then - pass "gen-plan generation rules include deferred AC keyword guard" + && grep -q "to be implemented in" "$GEN_PLAN_CMD" \ + && grep -q "review hints, not automatic failures" "$GEN_PLAN_CMD"; then + pass "gen-plan generation rules include semantic deferred AC guard" else - fail "gen-plan generation rules include deferred AC keyword guard" "Deferred AC Keyword Guard with keyword list" "missing" + fail "gen-plan generation rules include semantic deferred AC guard" "Deferred AC Semantic Guard with non-automatic marker handling" "missing" fi if [[ -f "$GEN_PLAN_CMD" ]] && grep -q "AC/Task Bidirectional Coverage" "$GEN_PLAN_CMD"; then @@ -318,10 +321,14 @@ fi if [[ -f "$GEN_PLAN_CMD" ]] \ && grep -q "REQUIRED_CHANGES" "$GEN_PLAN_CMD" \ - && grep -q "real work happens outside this RLCR loop" "$GEN_PLAN_CMD"; then + && grep -q "real work happen outside this RLCR loop" "$GEN_PLAN_CMD" \ + && grep -q "review hints, not automatic blockers" "$GEN_PLAN_CMD" \ + && grep -q "validating future dates as input" "$GEN_PLAN_CMD" \ + && ! grep -q "hard keyword scan" "$GEN_PLAN_CMD" \ + && ! grep -q "Treat these strings as blocking" "$GEN_PLAN_CMD"; then pass "gen-plan codex review requires semantic deferred-AC detection" else - fail "gen-plan codex review requires semantic deferred-AC detection" "REQUIRED_CHANGES for ACs whose real work happens outside the loop" "missing" + fail "gen-plan codex review requires semantic deferred-AC detection" "REQUIRED_CHANGES only for semantic deferrals, not keyword hits" "missing or keyword-blocking language present" fi if [[ -f "$REGULAR_REVIEW_TEMPLATE" ]] \