diff --git a/commands/gen-plan.md b/commands/gen-plan.md index 3b97435e..11c3b84b 100644 --- a/commands/gen-plan.md +++ b/commands/gen-plan.md @@ -266,6 +266,11 @@ After Claude candidate plan v1 is ready, run iterative challenge/refine rounds w "${CLAUDE_PLUGIN_ROOT}/scripts/ask-codex.sh" "" ``` - Prompt MUST include current candidate plan, prior disagreements, and unresolved items + - Prompt MUST include the RLCR plan contract: `AC-*` items are current RLCR completion gates; deferred, future, out-of-scope, post-work, or successor-loop goals must be represented as `FUT-*` under `## Future Work / Out of Scope`, optionally with a current-loop handoff AC. + - Prompt MUST require Codex to inspect each AC for deferral semantics. If any AC claims the real work happens outside this RLCR loop, Codex MUST put it under `REQUIRED_CHANGES`, not `OPTIONAL_IMPROVEMENTS`. + - Prompt MUST include the Handoff AC Pattern definition inline: when preserving a future goal, a current-loop AC may cover only the handoff state/artifact/documentation. The handoff AC may reference `FUT-*`, but its positive and negative tests must be fully verifiable in this loop without completing the future work. The final plan must not leave a `Handoff AC Pattern` template/example section behind. + - Prompt MUST require semantic deferred-AC detection within each AC body. Use these strings as review hints, not automatic blockers: `TODO`, `TBD`, `deferred`, `future`, `follow-up`, `subsequent`, `next phase`, `next iteration`, `next milestone`, `next loop`, `v2`, `v.next`, `Phase II`, `left for`, `to be implemented in`, `see FUT-`. Codex MUST put the issue under `REQUIRED_CHANGES` only when the AC's meaning makes the real work happen outside this RLCR loop and the AC is not a valid Handoff AC. Do not block current-scope domain wording solely because it contains a marker term, such as an AC about validating future dates as input. + - Prompt MUST require AC/Task bidirectional coverage: every `AC-*` is targeted by at least one Task Breakdown row; every Task Breakdown row targets at least one current-scope `AC-*`; no task target may be empty, `-`, `FUT-*`, or `DEC-*`. - Require output format: - `AGREE:` points accepted as reasonable - `DISAGREE:` points considered unreasonable and why @@ -280,8 +285,9 @@ After Claude candidate plan v1 is ready, run iterative challenge/refine rounds w - Topic - Claude position - Second Codex position - - Resolution status (`resolved`, `needs_user_decision`, `deferred`) + - Resolution status (`resolved` or `needs_user_decision`) - Round-to-round delta + - Do NOT use `deferred` as a convergence status. If the selected resolution defers work, it is `resolved` only after the candidate plan records a `DEC-*` decision with a non-`PENDING` `Decision Status`, links that decision to a `FUT-*` item under `## Future Work / Out of Scope`, and ensures the deferred work is not represented as a current-scope AC/task. If that DEC/FUT linkage is missing or the decision needs human input, mark the topic `needs_user_decision`. ### Loop Termination Rules @@ -293,8 +299,9 @@ Repeat convergence rounds until one of the following is true: If max rounds are reached with unresolved opposite opinions, carry them to user decision phase explicitly. Set convergence state explicitly: -- `PLAN_CONVERGENCE_STATUS=converged` when convergence conditions are met +- `PLAN_CONVERGENCE_STATUS=converged` when convergence conditions are met, no `needs_user_decision` topic remains, and every resolution that defers work already has a resolved `DEC-*` plus linked `FUT-*` entry - `PLAN_CONVERGENCE_STATUS=partially_converged` otherwise +- Any unlinked deferred-work resolution MUST force `PLAN_CONVERGENCE_STATUS=partially_converged` and `HUMAN_REVIEW_REQUIRED=true` --- @@ -309,7 +316,7 @@ Decide if manual review can be skipped: - Else if `AUTO_START_RLCR_IF_CONVERGED=true` **and** `PLAN_CONVERGENCE_STATUS=converged`, set `HUMAN_REVIEW_REQUIRED=false` - Otherwise set `HUMAN_REVIEW_REQUIRED=true` -If `HUMAN_REVIEW_REQUIRED=false`, skip Step 2-4 and continue directly to Phase 7. +Do not skip Step 1.5. If `HUMAN_REVIEW_REQUIRED=false`, run Step 1.5 first, then skip Step 2-4 and continue directly to Phase 7. ### Step 1.5: Consolidate Pending User Decisions (runs unconditionally) @@ -317,11 +324,13 @@ Before proceeding (regardless of `HUMAN_REVIEW_REQUIRED`), consolidate all user- 1. Extract `QUESTIONS_FOR_USER` items from Codex Analysis v1 (Phase 3) 2. Extract items with status `needs_user_decision` from the final convergence matrix (Phase 5) — use the last round's state, not intermediate rounds -3. Deduplicate: if the same topic appears in both sources, merge into one entry -4. For each collected item, check if it was substantively resolved during Phase 4-5 plan refinement (i.e., Claude addressed it and second Codex agreed in a subsequent round). Remove only items with clear evidence of resolution. -5. Write all remaining unresolved items into the plan's `## Pending User Decisions` section. Use `DEC-N` identifiers. Set `Decision Status` to `PENDING`. +3. Extract any convergence topic whose selected resolution defers work but lacks a resolved `DEC-*` plus linked `FUT-*` entry. Add it as a `PENDING` decision so it blocks auto-start instead of silently escaping the completion gate. +4. Deduplicate: if the same topic appears in multiple sources, merge into one entry +5. For each collected item, check if it was substantively resolved during Phase 4-5 plan refinement (i.e., Claude addressed it and second Codex agreed in a subsequent round). Remove only items with clear evidence of resolution and, for deferred-work resolutions, complete resolved `DEC-*`/`FUT-*` linkage. +6. Write all remaining unresolved items into the plan's `## Pending User Decisions` section. Use `DEC-N` identifiers. Set `Decision Status` to `PENDING`. - For Claude-vs-Codex disagreements: fill `Claude Position`, `Codex Position`, and `Tradeoff Summary` - For open questions (no opposing positions): set `Claude Position` to Claude's tentative answer (if any), `Codex Position` to `N/A - open question`, and `Tradeoff Summary` to the question's context + - For deferred-work resolutions that are already decided, do not leave them as `PENDING`; instead record the resolved decision, reference its `FUT-*` entry, and ensure the `FUT-*` entry includes `Source DEC: DEC-N` This ensures: - When `HUMAN_REVIEW_REQUIRED=true`: items are visible for Steps 2-4 user resolution @@ -384,6 +393,7 @@ Deeply think and generate the plan.md following these rules: ## Acceptance Criteria Following TDD philosophy, each criterion includes positive and negative tests for deterministic verification. +The `AC-*` items are current RLCR completion gates for this implementation loop. - AC-1: - Positive Tests (expected to PASS): @@ -450,11 +460,22 @@ Each task must include exactly one routing tag: - `coding`: implemented by Claude - `analyze`: executed via Codex (`/humanize:ask-codex`) +Every `AC-*` must be covered by at least one task. Every task must target at least one `AC-*`. Do not target `FUT-*`, `DEC-*`, or `-` in the Target AC column. + | Task ID | Description | Target AC | Tag (`coding`/`analyze`) | Depends On | |---------|-------------|-----------|----------------------------|------------| | task1 | <...> | AC-1 | coding | - | | task2 | <...> | AC-2 | analyze | task1 | +## Future Work / Out of Scope + +Future, deferred, post-work, successor-loop, and out-of-scope items belong here, not under `## Acceptance Criteria`. + +- FUT-1: + - Source DEC: DEC-1 + - Current-loop handoff: AC-X + - Promotion trigger: + ## Claude-Codex Deliberation ### Agreements @@ -510,23 +531,33 @@ When `alternative_plan_language` is empty, absent, set to `"English"`, or set to 5. **AC Format**: All acceptance criteria must use AC-X or AC-X.Y format. -6. **Clear Dependencies**: Show what depends on what, not when things happen. +6. **Current-Scope AC Contract**: `AC-*` items are the current RLCR completion gate. Do NOT create deferred ACs. Any deferred, future, out-of-scope, post-work, successor-task, or successor-loop goal must be written as `FUT-*` under `## Future Work / Out of Scope`, optionally linked to a current-loop Handoff AC. + +7. **Deferred AC Semantic Guard**: Before finalizing, inspect each AC body for deferral semantics. Deferral marker terms such as `TODO`, `TBD`, `deferred`, `future`, `follow-up`, `subsequent`, `next phase`, `next iteration`, `next milestone`, `next loop`, `v2`, `v.next`, `Phase II`, `left for`, `to be implemented in`, and `see FUT-` are review hints, not automatic failures. If the AC's meaning makes the real work happen outside this loop, rewrite the item as a current-loop handoff AC plus a `FUT-*` item, or move it entirely to future work. If a marker term is ordinary current-scope domain language, such as validating future dates as input, keep the AC if its tests are fully current-loop verifiable. + +8. **Handoff AC Pattern**: When preserving a future goal, write a current-loop AC only for the handoff state/artifact/documentation. The handoff AC may reference `FUT-*`, but its positive and negative tests must be fully verifiable in this loop and must not require completing the future work. This pattern is generation guidance only; do not leave a `Handoff AC Pattern` template/example section in the final plan. + +9. **AC/Task Bidirectional Coverage**: Every `AC-*` must be covered by at least one Task Breakdown row. Every Task Breakdown row must target at least one current-scope `AC-*`. No row may use an empty target, `-`, `FUT-*`, or `DEC-*` as its Target AC. + +10. **DEC/FUT Linkage**: If a resolved decision defers work, the decision resolution must explicitly reference a `FUT-*` item. Each `FUT-*` item caused by a decision must include `Source DEC: DEC-N`. If there is a current-loop handoff, both the DEC and FUT entry should reference the handoff AC. + +11. **Clear Dependencies**: Show what depends on what, not when things happen. -7. **TDD-Style Tests**: Each acceptance criterion MUST include both positive tests (expected to pass) and negative tests (expected to fail). This follows Test-Driven Development philosophy and enables deterministic verification. +12. **TDD-Style Tests**: Each acceptance criterion MUST include both positive tests (expected to pass) and negative tests (expected to fail). This follows Test-Driven Development philosophy and enables deterministic verification. -8. **Affirmative Path Boundaries**: Describe upper and lower bounds using affirmative language (what IS acceptable) rather than negative language (what is NOT acceptable). +13. **Affirmative Path Boundaries**: Describe upper and lower bounds using affirmative language (what IS acceptable) rather than negative language (what is NOT acceptable). -9. **Respect Deterministic Designs**: If the draft specifies a fixed approach with no choices, reflect this in the plan by narrowing the path boundaries to match the user's specification. +14. **Respect Deterministic Designs**: If the draft specifies a fixed approach with no choices, reflect this in the plan by narrowing the path boundaries to match the user's specification. -10. **Code Style Constraint**: The generated plan MUST include a section or note instructing that implementation code and comments should NOT contain plan-specific progress terminology such as "AC-", "Milestone", "Step", "Phase", or similar workflow markers. These terms belong in the plan document, not in the resulting codebase. +15. **Code Style Constraint**: The generated plan MUST include a section or note instructing that implementation code and comments should NOT contain plan-specific progress terminology such as "AC-", "Milestone", "Step", "Phase", or similar workflow markers. These terms belong in the plan document, not in the resulting codebase. -11. **Draft Completeness Requirement**: The generated plan MUST incorporate ALL information from the input draft document without omission. The draft represents the most valuable human input and must be fully preserved. Any clarifications obtained through Phase 6 should be added incrementally to the draft's original content, never replacing or losing any original requirements. The final plan must be a superset of the draft information plus all clarified details. +16. **Draft Completeness Requirement**: The generated plan MUST incorporate ALL information from the input draft document without omission. The draft represents the most valuable human input and must be fully preserved. Any clarifications obtained through Phase 6 should be added incrementally to the draft's original content, never replacing or losing any original requirements. The final plan must be a superset of the draft information plus all clarified details. -12. **Debate Traceability**: The plan MUST include Codex-first findings, Claude/Codex agreements, resolved disagreements, and unresolved decisions. Unresolved opposite opinions MUST be recorded in `## Pending User Decisions` for explicit user decision. +17. **Debate Traceability**: The plan MUST include Codex-first findings, Claude/Codex agreements, resolved disagreements, and unresolved decisions. Unresolved opposite opinions MUST be recorded in `## Pending User Decisions` for explicit user decision. -13. **Convergence Requirement**: The plan MUST record Claude/Codex agreements, resolved disagreements, and final convergence status in `## Claude-Codex Deliberation`. Stop only when convergence conditions are met or max rounds reached with explicit carry-over decisions. +18. **Convergence Requirement**: The plan MUST record Claude/Codex agreements, resolved disagreements, and final convergence status in `## Claude-Codex Deliberation`. Stop only when convergence conditions are met or max rounds reached with explicit carry-over decisions. -14. **Task Tag Requirement**: The plan MUST include `## Task Breakdown`, and every task MUST be tagged as either `coding` or `analyze` (no untagged tasks, no other tag values). +19. **Task Tag Requirement**: The plan MUST include `## Task Breakdown`, and every task MUST be tagged as either `coding` or `analyze` (no untagged tasks, no other tag values). --- @@ -549,6 +580,11 @@ After updating, **read the complete plan file** and verify: - The structured plan aligns with the original draft content - Claude/Codex disagreement handling is explicit and correctly reflected - No contradictions exist between different parts of the document +- No instructional `Handoff AC Pattern` template/example section remains in the final plan +- No `AC-*` uses deferred, future, out-of-scope, post-work, or successor-loop semantics except as a valid Handoff AC whose current-loop verification is complete without performing future work +- Every `AC-*` is covered by at least one Task Breakdown row, and every Task Breakdown row targets at least one current-scope `AC-*` +- Every decision that defers work links to a `FUT-*` entry, and every such `FUT-*` entry links back with `Source DEC: DEC-N` +- Items under `## Future Work / Out of Scope` use `FUT-*`, not `AC-*`, and are not listed as current-scope Task Breakdown work If inconsistencies are found, fix them using the Edit tool. @@ -598,6 +634,7 @@ If all of the following are true: - `PLAN_CONVERGENCE_STATUS=converged` - `GEN_PLAN_MODE=discussion` - There are no pending decisions with status `PENDING` +- Every convergence topic whose resolution defers work has a resolved `DEC-*` plus linked `FUT-*` entry; no deferred-work resolution exists only in the convergence matrix Then start work immediately by running: diff --git a/prompt-template/codex/full-alignment-review.md b/prompt-template/codex/full-alignment-review.md index 4367810e..0ec14bd1 100644 --- a/prompt-template/codex/full-alignment-review.md +++ b/prompt-template/codex/full-alignment-review.md @@ -8,6 +8,9 @@ This is a **mandatory checkpoint** (at configurable intervals). You must conduct @{{PLAN_FILE}} You MUST read this plan file first to understand the full scope of work before conducting your review. +Only items under `## Acceptance Criteria` and current-scope Task Breakdown rows are completion gates. +Items under `## Future Work` / `## Out of Scope`, including `FUT-*` items, are informational and MUST NOT block the COMPLETE verdict. +If a current-scope AC or current-scope task is deferred, treat it as incomplete. --- ## Claude's Work Summary @@ -103,12 +106,12 @@ The project's `.humanize/rlcr/{{LOOP_TIMESTAMP}}/` directory contains the histor ## Part 6: Output Requirements -- If issues found OR any AC is NOT MET (including deferred ACs), write your findings to @{{REVIEW_RESULT_FILE}} +- If issues found OR any current-scope AC is NOT MET (including deferred current-scope ACs), write your findings to @{{REVIEW_RESULT_FILE}} - Include specific action items for Claude to address, classified into: - Mainline Gaps - Blocking Side Issues - Queued Side Issues - **If development is stagnating** (see Part 4), write "STOP" as the last line -- **CRITICAL**: Only write "COMPLETE" as the last line if ALL ACs from the original plan are FULLY MET with no deferrals - - DEFERRED items are considered INCOMPLETE - do NOT output COMPLETE if any AC is deferred - - The ONLY condition for COMPLETE is: all original plan tasks are done, all ACs are met, no deferrals allowed +- **CRITICAL**: Only write "COMPLETE" as the last line if ALL current-scope ACs from the original plan are FULLY MET with no deferrals + - DEFERRED current-scope items are considered INCOMPLETE - do NOT output COMPLETE if any current-scope AC is deferred + - The ONLY condition for COMPLETE is: all current-scope original plan tasks are done, all current-scope ACs are met, no current-scope deferrals allowed diff --git a/prompt-template/codex/regular-review.md b/prompt-template/codex/regular-review.md index 4d4a8680..0fb4459e 100644 --- a/prompt-template/codex/regular-review.md +++ b/prompt-template/codex/regular-review.md @@ -7,6 +7,9 @@ You MUST read this plan file first to understand the full scope of work before conducting your review. This plan contains the complete requirements and implementation details that Claude should be following. +Only items under `## Acceptance Criteria` and current-scope Task Breakdown rows are completion gates. +Items under `## Future Work` / `## Out of Scope`, including `FUT-*` items, are informational and MUST NOT block the COMPLETE verdict. +If a current-scope AC or current-scope task is deferred, treat it as incomplete. Based on the original plan and @{{PROMPT_FILE}}, Claude claims to have completed the work. Please conduct a thorough critical review to verify this. @@ -23,11 +26,12 @@ Below is Claude's summary of the work completed: - Your task is to conduct a deep critical review, focusing on finding implementation issues and identifying gaps between "plan-design" and actual implementation. - Relevant top-level guidance documents, phased implementation plans, and other important documentation and implementation references are located under @{{DOCS_PATH}}. -- If Claude planned to defer any tasks to future phases in its summary, DO NOT follow its lead. Instead, you should force Claude to complete ALL tasks as planned. +- If Claude planned to defer any current-scope tasks to future phases in its summary, DO NOT follow its lead. Instead, you should force Claude to complete ALL current-scope tasks as planned. - Such deferred tasks are considered incomplete work and should be flagged in your review comments, requiring Claude to address them. - - If Claude planned to defer any tasks, please explore the codebase in-depth and draft a detailed implementation plan. This plan should be included in your review comments for Claude to follow. + - If Claude planned to defer any current-scope tasks, please explore the codebase in-depth and draft a detailed implementation plan. This plan should be included in your review comments for Claude to follow. + - Do NOT draft implementation plans solely for `FUT-*`, `## Future Work`, or `## Out of Scope` deferrals unless they block a current-scope AC or current-scope task. - Your review should be meticulous and skeptical. Look for any discrepancies, missing features, incomplete implementations. -- If Claude does not plan to defer any tasks, but honestly admits that some tasks are still pending (not yet completed), you should also include those pending tasks in your review. +- If Claude does not plan to defer any current-scope tasks, but honestly admits that some current-scope tasks are still pending (not yet completed), you should also include those pending tasks in your review. - Your review should elaborate on those unfinished tasks, explore the codebase, and draft an implementation plan. - A good engineering implementation plan should be **singular, directive, and definitive**, rather than discussing multiple possible implementation options. - The implementation plan should be **unambiguous**, internally consistent, and coherent from beginning to end, so that **Claude can execute the work accurately and without error**. @@ -66,11 +70,11 @@ If Claude mostly worked on queued side issues and failed to advance the mainline ## Part 5: Output Requirements -- In short, your review comments can include: problems/findings/blockers; claims that don't match reality; implementation plans for deferred work (to be implemented now); implementation plans for unfinished work; goal alignment issues. +- In short, your review comments can include: problems/findings/blockers; claims that don't match reality; implementation plans for deferred current-scope work (to be implemented now); implementation plans for unfinished current-scope work; goal alignment issues. - Your output should be structured so Claude can tell which items are mainline gaps, blocking side issues, and queued side issues. -- If after your investigation the actual situation does not match what Claude claims to have completed, or there is pending work to be done, output your review comments to @{{REVIEW_RESULT_FILE}}. -- **CRITICAL**: Only output "COMPLETE" as the last line if ALL tasks from the original plan are FULLY completed with no deferrals - - DEFERRED items are considered INCOMPLETE - do NOT output COMPLETE if any task is deferred - - UNFINISHED items are considered INCOMPLETE - do NOT output COMPLETE if any task is pending - - The ONLY condition for COMPLETE is: all original plan tasks are done, all ACs are met, no deferrals or pending work allowed +- If after your investigation the actual situation does not match what Claude claims to have completed, or there is pending current-scope work to be done, output your review comments to @{{REVIEW_RESULT_FILE}}. +- **CRITICAL**: Only output "COMPLETE" as the last line if ALL current-scope tasks from the original plan are FULLY completed with no deferrals + - DEFERRED current-scope items are considered INCOMPLETE - do NOT output COMPLETE if any current-scope task is deferred + - UNFINISHED current-scope items are considered INCOMPLETE - do NOT output COMPLETE if any current-scope task is pending + - The ONLY condition for COMPLETE is: all current-scope original plan tasks are done, all current-scope ACs are met, no current-scope deferrals or pending work allowed - The word COMPLETE on the last line will stop Claude. diff --git a/prompt-template/plan/gen-plan-template.md b/prompt-template/plan/gen-plan-template.md index ebdd2d98..e28ec25c 100644 --- a/prompt-template/plan/gen-plan-template.md +++ b/prompt-template/plan/gen-plan-template.md @@ -6,6 +6,7 @@ ## Acceptance Criteria Following TDD philosophy, each criterion includes positive and negative tests for deterministic verification. +The `AC-*` items are current RLCR completion gates for this implementation loop. - AC-1: - Positive Tests (expected to PASS): @@ -72,11 +73,22 @@ Each task must include exactly one routing tag: - `coding`: implemented by Claude - `analyze`: executed via Codex (`/humanize:ask-codex`) +Every `AC-*` must be covered by at least one task. Every task must target at least one `AC-*`. Do not target `FUT-*`, `DEC-*`, or `-` in the Target AC column. + | Task ID | Description | Target AC | Tag (`coding`/`analyze`) | Depends On | |---------|-------------|-----------|----------------------------|------------| | task1 | <...> | AC-1 | coding | - | | task2 | <...> | AC-2 | analyze | task1 | +## Future Work / Out of Scope + +Future, deferred, post-work, successor-loop, and out-of-scope items belong here, not under `## Acceptance Criteria`. + +- FUT-1: + - Source DEC: DEC-1 + - Current-loop handoff: AC-X + - Promotion trigger: + ## Claude-Codex Deliberation ### Agreements diff --git a/tests/test-gen-plan.sh b/tests/test-gen-plan.sh index b5bcab07..973ccd04 100755 --- a/tests/test-gen-plan.sh +++ b/tests/test-gen-plan.sh @@ -121,6 +121,8 @@ fi echo "" echo "PT-5b: Claude/Codex deliberation workflow validation" PLAN_TEMPLATE="$PROJECT_ROOT/prompt-template/plan/gen-plan-template.md" +REGULAR_REVIEW_TEMPLATE="$PROJECT_ROOT/prompt-template/codex/regular-review.md" +FULL_ALIGNMENT_TEMPLATE="$PROJECT_ROOT/prompt-template/codex/full-alignment-review.md" if [[ -f "$GEN_PLAN_CMD" ]] && grep -q "scripts/ask-codex.sh" "$GEN_PLAN_CMD"; then pass "gen-plan command allows ask-codex script" @@ -234,6 +236,130 @@ else fail "plan template includes coding/analyze task tag column" "tag column in task table" "missing" fi +if [[ -f "$PLAN_TEMPLATE" ]] && ! grep -q "Handoff AC Pattern" "$PLAN_TEMPLATE"; then + pass "plan template excludes handoff AC pattern from copied output" +else + fail "plan template excludes handoff AC pattern from copied output" "no Handoff AC Pattern template/example section" "section still present" +fi + +if [[ -r "$PLAN_TEMPLATE" ]]; then + if AC_SECTION=$(awk '/^## Acceptance Criteria[[:space:]]*$/{in_ac=1; next} /^## / && in_ac{in_ac=0} in_ac' "$PLAN_TEMPLATE"); then + if ! grep -q '[^[:space:]]' <<< "$AC_SECTION"; then + fail "plan template acceptance criteria section omits deferred/future markers" "non-empty Acceptance Criteria section" "section missing or empty" + elif ! grep -Eq "deferred|future|follow-up|subsequent|next phase|next iteration|next milestone|next loop|v2|v\\.next|Phase II|left for|to be implemented in|FUT-" <<< "$AC_SECTION"; then + pass "plan template acceptance criteria section omits deferred/future markers" + else + fail "plan template acceptance criteria section omits deferred/future markers" "no deferred/future markers under Acceptance Criteria" "markers present" + fi + else + fail "plan template acceptance criteria section omits deferred/future markers" "Acceptance Criteria section can be extracted" "awk extraction failed" + fi +else + fail "plan template acceptance criteria section omits deferred/future markers" "readable plan template" "missing or unreadable: $PLAN_TEMPLATE" +fi + +if [[ -f "$GEN_PLAN_CMD" ]] && grep -q "Handoff AC Pattern" "$GEN_PLAN_CMD" && grep -q "generation guidance only" "$GEN_PLAN_CMD"; then + pass "gen-plan command keeps handoff pattern as generation guidance" +else + fail "gen-plan command keeps handoff pattern as generation guidance" "Handoff AC Pattern generation guidance" "missing" +fi + +if [[ -f "$GEN_PLAN_CMD" ]] \ + && grep -qF "Prompt MUST include the Handoff AC Pattern definition inline" "$GEN_PLAN_CMD" \ + && grep -qF "current-loop AC may cover only the handoff" "$GEN_PLAN_CMD" \ + && grep -qF "without completing the future work" "$GEN_PLAN_CMD"; then + pass "gen-plan second Codex prompt defines handoff AC pattern inline" +else + fail "gen-plan second Codex prompt defines handoff AC pattern inline" "inline Handoff AC Pattern definition in Phase 5 prompt requirements" "missing" +fi + +if [[ -f "$PLAN_TEMPLATE" ]] && grep -q "## Future Work / Out of Scope" "$PLAN_TEMPLATE" && grep -q "FUT-1" "$PLAN_TEMPLATE"; then + pass "plan template includes FUT future-work section" +else + fail "plan template includes FUT future-work section" "Future Work / Out of Scope with FUT-* example" "missing" +fi + +if [[ -f "$PLAN_TEMPLATE" ]] && grep -q "current RLCR completion gates" "$PLAN_TEMPLATE"; then + pass "plan template defines ACs as current RLCR completion gates" +else + fail "plan template defines ACs as current RLCR completion gates" "current RLCR completion gates contract" "missing" +fi + +if [[ -f "$GEN_PLAN_CMD" ]] \ + && grep -q "Deferred AC Semantic Guard" "$GEN_PLAN_CMD" \ + && grep -q "next phase" "$GEN_PLAN_CMD" \ + && grep -q "to be implemented in" "$GEN_PLAN_CMD" \ + && grep -q "review hints, not automatic failures" "$GEN_PLAN_CMD"; then + pass "gen-plan generation rules include semantic deferred AC guard" +else + fail "gen-plan generation rules include semantic deferred AC guard" "Deferred AC Semantic Guard with non-automatic marker handling" "missing" +fi + +if [[ -f "$GEN_PLAN_CMD" ]] && grep -q "AC/Task Bidirectional Coverage" "$GEN_PLAN_CMD"; then + pass "gen-plan generation rules include AC/task bidirectional coverage" +else + fail "gen-plan generation rules include AC/task bidirectional coverage" "AC/Task Bidirectional Coverage rule" "missing" +fi + +if [[ -f "$GEN_PLAN_CMD" ]] && grep -q "DEC/FUT Linkage" "$GEN_PLAN_CMD"; then + pass "gen-plan generation rules include DEC/FUT linkage" +else + fail "gen-plan generation rules include DEC/FUT linkage" "DEC/FUT Linkage rule" "missing" +fi + +if [[ -f "$GEN_PLAN_CMD" ]] \ + && grep -qF 'Resolution status (`resolved` or `needs_user_decision`)' "$GEN_PLAN_CMD" \ + && grep -qF 'Do NOT use `deferred` as a convergence status' "$GEN_PLAN_CMD" \ + && grep -qF 'resolved `DEC-*` plus linked `FUT-*`' "$GEN_PLAN_CMD" \ + && grep -qF 'unlinked deferred-work resolution MUST force `PLAN_CONVERGENCE_STATUS=partially_converged`' "$GEN_PLAN_CMD" \ + && grep -qF 'no deferred-work resolution exists only in the convergence matrix' "$GEN_PLAN_CMD" \ + && ! grep -qF 'Resolution status (`resolved`, `needs_user_decision`, `deferred`)' "$GEN_PLAN_CMD"; then + pass "gen-plan convergence matrix prevents deferred status escape hatch" +else + fail "gen-plan convergence matrix prevents deferred status escape hatch" "no deferred status and DEC/FUT linkage required before convergence/auto-start" "missing or stale convergence status rule" +fi + +if [[ -f "$GEN_PLAN_CMD" ]] \ + && grep -q "REQUIRED_CHANGES" "$GEN_PLAN_CMD" \ + && grep -q "real work happen outside this RLCR loop" "$GEN_PLAN_CMD" \ + && grep -q "review hints, not automatic blockers" "$GEN_PLAN_CMD" \ + && grep -q "validating future dates as input" "$GEN_PLAN_CMD" \ + && ! grep -q "hard keyword scan" "$GEN_PLAN_CMD" \ + && ! grep -q "Treat these strings as blocking" "$GEN_PLAN_CMD"; then + pass "gen-plan codex review requires semantic deferred-AC detection" +else + fail "gen-plan codex review requires semantic deferred-AC detection" "REQUIRED_CHANGES only for semantic deferrals, not keyword hits" "missing or keyword-blocking language present" +fi + +if [[ -f "$REGULAR_REVIEW_TEMPLATE" ]] \ + && grep -q "FUT-\\*" "$REGULAR_REVIEW_TEMPLATE" \ + && grep -q "MUST NOT block the COMPLETE verdict" "$REGULAR_REVIEW_TEMPLATE"; then + pass "regular RLCR review template excludes FUT items from COMPLETE gate" +else + fail "regular RLCR review template excludes FUT items from COMPLETE gate" "FUT-* items MUST NOT block COMPLETE" "missing" +fi + +if [[ -f "$REGULAR_REVIEW_TEMPLATE" ]] \ + && grep -q "defer any current-scope tasks" "$REGULAR_REVIEW_TEMPLATE" \ + && grep -q 'Do NOT draft implementation plans solely for `FUT-\*`' "$REGULAR_REVIEW_TEMPLATE" \ + && grep -q "unfinished current-scope work" "$REGULAR_REVIEW_TEMPLATE" \ + && grep -q "pending current-scope work" "$REGULAR_REVIEW_TEMPLATE" \ + && ! grep -q "defer any tasks" "$REGULAR_REVIEW_TEMPLATE" \ + && ! grep -q "if any task is deferred" "$REGULAR_REVIEW_TEMPLATE" \ + && ! grep -q "if any task is pending" "$REGULAR_REVIEW_TEMPLATE"; then + pass "regular RLCR review template scopes deferral escalation to current-scope tasks" +else + fail "regular RLCR review template scopes deferral escalation to current-scope tasks" "current-scope-only deferral escalation" "unscoped task deferral language present" +fi + +if [[ -f "$FULL_ALIGNMENT_TEMPLATE" ]] \ + && grep -q "FUT-\\*" "$FULL_ALIGNMENT_TEMPLATE" \ + && grep -q "MUST NOT block the COMPLETE verdict" "$FULL_ALIGNMENT_TEMPLATE"; then + pass "full-alignment RLCR review template excludes FUT items from COMPLETE gate" +else + fail "full-alignment RLCR review template excludes FUT items from COMPLETE gate" "FUT-* items MUST NOT block COMPLETE" "missing" +fi + if [[ -f "$GEN_PLAN_CMD" ]] && grep -q "### Step 1.5: Consolidate Pending User Decisions" "$GEN_PLAN_CMD"; then pass "gen-plan command includes consolidate pending user decisions step" else