Skip to content

Address MAP framework bundle: 8 framework gaps surfaced in a downstream run#142

Open
azalio wants to merge 10 commits into
mainfrom
fix-map-framework-bundle
Open

Address MAP framework bundle: 8 framework gaps surfaced in a downstream run#142
azalio wants to merge 10 commits into
mainfrom
fix-map-framework-bundle

Conversation

@azalio
Copy link
Copy Markdown
Owner

@azalio azalio commented May 22, 2026

Summary

Per-issue notes

Test plan

  • uv run pytest -q — 1437 passed, 4 skipped (was 1398 on main; +44 new test cases this PR).
  • make lint — ruff + mypy clean.
  • make sync-templates — all dev/template pairs in sync.
  • Validate end-to-end on neuro-vlad branch new-road: copy updated .claude/skills/map-check/SKILL.md (or mapify init against the new release) and re-run /map-check to confirm Step 2 no longer crashes.
  • Smoke-test the new CLI subcommands manually: peek_current_step, mark_subtask_complete, save_research, load_research, validate_mutation_boundary.

Deferred (not in this PR)

  • ~10 pre-existing Pyright diagnostics in map_step_runner.py (dict.get() typed-as-object.pop()/[]/int() errors) and 6 in test_map_step_runner.py surfaced during this work. They are unrelated to the eight fixes and would balloon the diff; tracked for a separate cleanup PR.

🤖 Generated with Claude Code

Eight inter-related framework issues surfaced during a downstream run. All
fixes ship together with regression tests and template sync.

map-check/SKILL.md (#7-bundle bug)
  Step 2 indexed `pending_steps["ST-001"]` but the canonical schema makes
  pending_steps a flat list[str] of workflow phase ids — jq crashed with
  `Cannot index array with string`. Rewrote Step 2 around workflow_status
  + flat-array iteration.

get_next_step short-circuit (#7)
  Added early-return on workflow_status=='WORKFLOW_COMPLETE' so a stale
  repopulation of pending_steps after a finished run no longer surfaces
  a phantom RESEARCH step.

build_context_block CLI surfacing (#6)
  map_step_runner.py already exposed the CLI subcommand; skill docs still
  pushed `python -c "import sys; sys.path.insert..."`. Replaced with the
  canonical CLI invocation + bash recipe.

save_research / load_research API (#12)
  New subtask-scoped artifact API in map_step_runner.py (function +
  CLI subcommands) with strict sanitization. Storage lands at
  .map/<branch>/research/<subtask_id>__<kind>.md, partitioned by kind
  (actor / monitor / decomposer). map-efficient RESEARCH phase rewired
  to use it.

peek_current_step (#2)
  Read-only recovery escape hatch for "Step mismatch: expected Y, got X"
  after validate_step double-advance. Returns the same shape as
  get_next_step but never saves the state.

mark_subtask_complete (#3)
  CLI subcommand on the orchestrator to short-circuit already-done /
  no-op subtasks without the research→actor→monitor cycle. Records a
  synthetic subtask_result with status='no-op' for audit, advances the
  cursor, and closes the workflow atomically when it was the last
  subtask. Skill prompt updated with the new path.

validate_mutation_boundary (#11)
  New CLI in map_step_runner.py compares the actual git diff vs
  blueprint.subtasks[id].affected_files. Warn-only default (appends to
  .map/<branch>/scope-violations.log); MAP_STRICT_SCOPE=1 escalates to
  hard reject. .map/ and .codex/ paths are excluded from the actual
  surface — they are framework infrastructure, not subtask scope.
  Monitor agent prompt now runs it during the verification sequence.

Wave-planner over-serialization guidance (#4)
  Audit identified the root cause as decomposer-side false dependencies
  (linear deps collapse the wave planner to single-subtask waves).
  Added "Minimize Dependencies for Parallelism" section to
  task-decomposer.md + new checklist items requiring each edge be
  load-bearing and affected_files always populated.

context-meter (#13)
  Already implemented in .claude/hooks/context-meter.py — closed as
  resolved with documentation in TaskUpdate.

Side fix: end-of-turn.sh hook used `py_compile`, which writes
__pycache__/*.pyc next to source even with -B (emitting bytecode is
its entire job). Replaced with `ast.parse` so editing any .py module
under src/mapify_cli/templates/ no longer trips the template-hygiene
gate. Same change in the Monitor agent's syntax-check recommendation
(monitor.md + monitor.toml).

Code hygiene cleanups along the way:
  - Removed unused `state: StepState` param from _write_retry_quarantine.
  - pyright: ignore on the dynamic DependencyGraph / SubtaskNode imports
    (importlib spec fallback Pyright cannot follow).
  - Three pre-existing tmp_path unused fixture params in
    test_map_orchestrator.py got the documented `del` suppression.

Tests: +44 new test cases. Full suite 1437 passed / 4 skipped.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 22, 2026 13:56
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Bundles several MAP framework hardening fixes surfaced by downstream usage: adds new orchestration/step-runner surfaces, wires research artifacts into /map-efficient, strengthens mutation-boundary verification, and updates hook/agent guidance to avoid template hygiene regressions.

Changes:

  • Added orchestrator recovery + workflow helpers (peek_current_step, mark_subtask_complete) and fixed get_next_step completion short-circuit.
  • Added step-runner research artifact API (save_research / load_research) and a git-diff-based mutation boundary validator with warn/strict modes.
  • Updated skills/agents/hooks and added regression tests (including replacing py_compile with ast.parse to prevent __pycache__ pollution).

Reviewed changes

Copilot reviewed 20 out of 20 changed files in this pull request and generated 16 comments.

Show a summary per file
File Description
tests/test_skills.py Adds regression tests ensuring updated decomposer/skills guidance stays present and schema-correct.
tests/test_map_step_runner.py Adds tests for mutation-boundary validation and save/load research CLI behavior.
tests/test_map_orchestrator.py Adds regression tests for new orchestrator helpers and completion short-circuit behavior.
tests/hooks/test_end_of_turn.py Adds regression tests ensuring syntax checks don’t create __pycache__ while still catching syntax errors.
src/mapify_cli/templates/skills/map-efficient/SKILL.md Documents no-op short-circuit, research artifact wiring, and build_context_block CLI usage.
src/mapify_cli/templates/skills/map-check/SKILL.md Fixes jq usage to treat pending_steps as a flat array and rely on workflow_status.
src/mapify_cli/templates/map/scripts/map_step_runner.py Implements save_research/load_research + validate_mutation_boundary and exposes new CLI subcommands.
src/mapify_cli/templates/map/scripts/map_orchestrator.py Adds peek_current_step, mark_subtask_complete, and WORKFLOW_COMPLETE short-circuit in get_next_step.
src/mapify_cli/templates/hooks/end-of-turn.sh Replaces py_compile with ast.parse to avoid writing bytecode into templates.
src/mapify_cli/templates/codex/agents/monitor.toml Updates Python build-gate guidance to use ast.parse instead of py_compile.
src/mapify_cli/templates/agents/task-decomposer.md Adds mandatory guidance to minimize false dependency edges and require populated affected_files.
src/mapify_cli/templates/agents/monitor.md Adds mutation-boundary verification step and updates Python build-gate guidance.
.map/scripts/map_step_runner.py Mirrors template step-runner updates for runtime use.
.map/scripts/map_orchestrator.py Mirrors template orchestrator updates for runtime use.
.codex/agents/monitor.toml Mirrors template Codex monitor guidance update.
.claude/skills/map-efficient/SKILL.md Mirrors template /map-efficient skill updates.
.claude/skills/map-check/SKILL.md Mirrors template /map-check skill updates.
.claude/hooks/end-of-turn.sh Mirrors template hook update using ast.parse to prevent __pycache__.
.claude/agents/task-decomposer.md Mirrors template task-decomposer guidance additions.
.claude/agents/monitor.md Mirrors template monitor guidance additions.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +5074 to +5078
diff_result = subprocess.run(
["git", "diff", "--name-only", base_ref],
cwd=project_dir,
capture_output=True,
text=True,
Comment on lines +5761 to +5766
# Exit code: 0 unless MAP_STRICT_SCOPE=1 AND status=="violation".
base_ref_arg = sys.argv[4] if len(sys.argv) >= 5 else None
report = validate_mutation_boundary(sys.argv[2], sys.argv[3], base_ref_arg)
print(json.dumps(report, indent=2))
if report.get("status") == "violation" and report.get("strict"):
sys.exit(1)
Comment on lines +5026 to +5030
Return shape::
{
"status": "clean" | "warning" | "violation",
"subtask_id": str,
"base_ref": str,
load_research(branch_arg, subtask_arg, kind=kind_arg)
)
except ValueError as exc:
print(json.dumps({"status": "error", "message": str(exc)}))

Some subtasks are already-done historically (rename/refactor landed in a prior PR), or are docs-only and don't need the full research→actor→monitor cycle. Skip them up-front to save tokens:

```bash
Comment on lines +5028 to +5034
"status": "clean" | "warning" | "violation",
"subtask_id": str,
"base_ref": str,
"expected": [str], # declared affected_files
"actual": [str], # files actually changed
"unexpected": [str], # actual but not expected (scope leak)
"strict": bool,
Comment thread .map/scripts/map_step_runner.py Outdated
load_research(branch_arg, subtask_arg, kind=kind_arg)
)
except ValueError as exc:
print(json.dumps({"status": "error", "message": str(exc)}))

Some subtasks are already-done historically (rename/refactor landed in a prior PR), or are docs-only and don't need the full research→actor→monitor cycle. Skip them up-front to save tokens:

```bash
Comment thread .claude/agents/monitor.md
Comment on lines +56 to +60
5. **Verify mutation boundary (MANDATORY):** Run
`python3 .map/scripts/map_step_runner.py validate_mutation_boundary <branch> <subtask_id>`
to compare the actual git diff against the subtask's declared `affected_files`.
- `status="clean"` → continue.
- `status="warning"` → record the `unexpected` files in your verdict; do
Call `research-agent` for the current subtask, then persist its concise findings via the canonical `save_research` API so Actor and Monitor consume them from the same path. Validate the phase with the orchestrator.

```bash
# After research-agent returns findings in $RESEARCH_FINDINGS:
azalio and others added 9 commits May 22, 2026 17:14
Copilot flagged 16 comments (8 unique × dev/template copies). All fixed in
the same PR.

Functional bugs
  - validate_mutation_boundary now checks return codes from `git status` and
    `git diff`. `git status` non-zero ⇒ hard error (cannot silently report
    "clean" outside a git repo). Caller-supplied invalid `base_ref` ⇒ hard
    error. Auto-resolved base_ref that doesn't exist (fresh repo, no commits
    yet) ⇒ fall through to porcelain-only and report against uncommitted
    state, not error.
  - CLI for validate_mutation_boundary now exits 1 on status="error" so
    Monitor's mandatory gate cannot silently pass via missing blueprint /
    unknown subtask / git failure.
  - load_research CLI now writes its error JSON to STDERR (stdout stays
    empty) so command substitution `FOO=$(... load_research ...)` is not
    corrupted by error payloads.

Documentation
  - validate_mutation_boundary docstring now lists the "error" return shape
    so callers don't assume only clean/warning/violation.
  - Monitor agent prompt now spells out the status="error" branch
    (`valid: false` with returned message).

Skill snippet bugs
  - Both `mark_subtask_complete` and `save_research` snippets in
    map-efficient/SKILL.md now define `SUBTASK_ID=$(jq -r
    '.current_subtask_id' …)` before use. Previously the snippets relied on
    a variable set only in a later phase, producing an empty / wrong value
    on the no-op path and on RESEARCH.

Tests / cosmetic
  - test_branch_is_sanitized actually passes `feature/x` now and asserts
    the result lands under `feature-x/`, not the literal subpath. The prior
    version's docstring lied about what it verified.
  - "each dependencies edge" → "each dependency edge" typo.

Regression tests added
  - test_error_when_not_a_git_repo
  - test_cli_exits_non_zero_on_error_status
  - TestLoadResearchCliErrorChannel.test_invalid_subtask_id_writes_to_stderr_keeps_stdout_empty

Full suite: 1440 passed (+3) / 4 skipped. ruff + mypy clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
A downstream invocation of /map-efficient against a repo that had a
complete task_plan_<branch>.md ready for resume refused with "needs a
task description in \$TASK_ARGS" — the model skipped Step 0 and
checked $ARGUMENTS for emptiness as a stop condition.

Step 0 has always supported resume, but the contract was implicit. Made
it explicit:

  - "MANDATORY: Empty \$TASK_ARGS is NOT a stop condition." Spelled out
    the 3-of-3 contract: only exit when args are empty AND no
    step_state.json AND no task_plan_<branch>.md.
  - Step 0 now checks step_state.json BEFORE plan resume (in-flight work
    wins over a stale plan-only resume that would recreate state from
    INIT_STATE and lose subtask_results).
  - On the empty-everything path, the skill exits with a clear "provide
    a task description OR run /map-plan first" message instead of
    silently doing nothing.

Regression tests: TestMapEfficientEmptyArgsResumeGuard (3 cases × 2
copies = 6 tests). Suite 1446 passed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Same PR as the original eight fixes. These came from a second downstream
review of issues #1-#12 that had not landed in the first batch.

#4 (real bug, found via code read at lines 736-742):
  validate_step at the inter-subtask boundary (pending_steps emptied but
  more subtasks remain) was setting current_step_id="COMPLETE" and
  returning next_step="COMPLETE". get_next_step then re-advanced and
  handed back RESEARCH for the next subtask, so the workflow recovered
  — but the validate_step response had already lied, making COMPLETE
  indistinguishable from a true terminal state. Now emits the explicit
  "ADVANCE_SUBTASK" sentinel (with matching current_step_id/phase) and
  reserves COMPLETE for the actual terminal case.

#11 validate_step idempotency:
  Re-running validate_step X when X is already in completed_steps is now
  a no-op success ({valid: True, idempotent: True}) instead of "Step
  mismatch: expected Y, got X". Combined with the new peek_current_step,
  callers can safely retry without recovery dances.

#5 RESEARCH enforced (not prompt-text):
  validate_step("2.2") now verifies that .map/<branch>/research/
  <current_subtask>__*.md exists. If not, rejects with valid=false and
  the exact save_research command to run. "MANDATORY RESEARCH" is now
  actual behaviour, not just docs.

#3 resume_from_plan auto-set_waves:
  When blueprint.json is present, resume_from_plan now invokes set_waves
  itself and reports the outcome in waves_computed: "success" / "error"
  / "skipped". /map-efficient skill no longer needs to dispatch
  set_waves manually after every resume.

#7 get_subtask CLI:
  python3 .map/scripts/map_step_runner.py get_subtask <ID> [--branch X]
  Hides the {flat, blueprint-wrapped} schema dichotomy so callers stop
  needing ad-hoc jq with two fallbacks.

#10 pytest-timeout in test deps:
  CLAUDE.md examples reference `pytest --timeout=60` but the package was
  missing; added to requirements-test.txt and pyproject.toml test/dev
  extras.

#2 wave-API integration (partial — full pivot deferred):
  Added documentation guidance in map-efficient/SKILL.md explaining when
  the sequential walker (get_next_step) vs the wave loop
  (get_wave_step / validate_wave_step / advance_wave) applies, and
  noted that resume_from_plan now auto-populates execution_waves. The
  deeper unification — making get_next_step itself walk by
  execution_waves rather than subtask_sequence — touches multiple
  invariant tests and is tracked as a separate follow-up plan.

Test integration fix:
  tests/integration/test_e2e_artifact_contracts.py walked subtask
  phases without writing research artifacts; updated to plant
  .map/<branch>/research/ST-NNN__actor.md per subtask now that RESEARCH
  enforcement is real.

+11 new test cases: TestValidateStepIdempotency,
TestValidateStepInterSubtaskBoundary, TestValidateStepResearchEnforcement,
TestResumeFromPlanAutoSetWaves, TestGetSubtaskCli. Suite: 1456 passed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…hment)

#1: workflow-context-injector now stamps the [MAP] reminder with the hook's
wall-clock UTC time AND the age of step_state.json (now - mtime), e.g.
  [MAP] @ 14:23:01.234Z (state +0.5s) 2.3 ACTOR | ...
If the hook is reading stale state (the symptom: "[MAP] still says ACTOR
after I validate_step'd to MONITOR"), the "state +Xs" delta makes it
obvious — a fresh validate_step would push mtime to "now" so the next
hook firing should report a small delta. Future repros can compare deltas
across consecutive reminders to confirm whether it's a hook cache or a
genuinely stale state file.

#6: build_context_block now emits the subtask's `description` field
(the long-form prose what/why from blueprint) and `risk_level`. Validated
against the real neuro-vlad blueprint — ST-001's 400-char description
flows into the context block instead of forcing Actor to re-open
blueprint.json. Length went 21 → 22 lines but the per-line density grew
substantially. Description is truncated to 480 chars to stay within
the context budget.

+2 new test cases (TestBuildContextBlockIncludesDescription). Suite: 1458.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Eight more fixes surfaced in a fresh /map-efficient run on neuro-vlad
new-road after the earlier batches landed.

#1 record_subtask_result CLI:
  Skill text already said "record files changed in step_state.json" but
  no public command existed; callers reached into Python or hoped
  validate_step did it implicitly. Added
  `python3 .map/scripts/map_orchestrator.py record_subtask_result <ID>
  <status> --files a.py,b.py --summary "..." --commit-sha SHA`.

#2 ADVANCE_SUBTASK documented:
  The "ADVANCE_SUBTASK" sentinel introduced in the previous batch (#4)
  had no description in map-efficient/SKILL.md. Added an explicit
  "Phase: ADVANCE_SUBTASK (synthetic boundary)" section so callers
  know it's a free transition (call get_next_step again) and not a
  phase to execute.

#3 Wave banner truthfulness:
  workflow-context-injector now reports "[waves computed, sequential
  walker active]" when execution_waves is populated but
  current_wave_index is still 0 (sequential walker has not been
  swapped for the wave loop). Previously the banner claimed
  "mode batch:parallel" even when nothing parallel was happening.

#5 Monitor verdict contract:
  Added a "Verdict consistency contract (MANDATORY)" block to
  monitor.md: MEDIUM+ severity issues force valid:false, and any
  `recommendation in {"revise","block","needs_investigation"}` forbids
  `valid:true`. Closes the loophole where Monitor returned
  valid:true with recommendation:revise and the skill silently advanced.

#6 build_context_block truncation marker:
  Added a compact "# [TRUNCATED] see .map/<branch>/token_budget.json"
  marker inside the budgeted text when clipping happened, replacing
  the prior silent loss. Token-budget aware: the marker REPLACES the
  existing "# Context Budget:..." footer so net token cost is zero
  (the contract assertion stays <= configured budget).

#7 save_research attempt versioning:
  `save_research(..., attempt=N)` (and CLI flag `--attempt N`) now
  preserves a numbered snapshot at `<id>__<kind>.attempt-<N>.md`
  BEFORE overwriting the canonical file. Useful for clean-retry
  diffing after Monitor rejection.

#9 mark_subtask_complete hint:
  get_next_step's RESEARCH (2.2) instruction now mentions both the
  save_research command (positive path) AND the mark_subtask_complete
  no-op short-circuit (escape hatch). Previously the operator had to
  recall the latter from efficient-reference.md.

#11 finalize_plan CLI:
  `python3 .map/scripts/map_orchestrator.py finalize_plan` bumps
  artifact_manifest.stages.plan to "complete" when blueprint +
  task_plan are present. Closes the stage-stuck-partial trap reported
  on neuro-vlad new-road's manifest.

#12 validate_step("2.4") auto mutation-boundary:
  MONITOR gate now runs validate_mutation_boundary internally for the
  current subtask. Warn-only by default; MAP_STRICT_SCOPE=1 escalates
  to a hard reject. Best-effort: missing blueprint or git failure is
  silently skipped so the gate stays usable in unit-test contexts.

Skipped from the 12-issue list:
  #4 hook lag — repro now possible with timestamps from previous PR
    commit; awaiting fresh logs to diagnose root cause.
  #8 type-ignore misapplication — agent-quality, not framework.
  #10 per-subtask token accounting — needs new transcript-parsing
    infrastructure; tracked for separate plan.

+7 test classes added. Suite: 1462 passed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes the "no cheap way to know how many tokens spent in current
subtask" gap from the latest framework triage.

  python3 .map/scripts/map_step_runner.py subtask_token_usage <branch> \
    [subtask_id] [--since-ts <ISO>]

Behaviour:
  * Resolves Claude Code's per-session log dir via the canonical
    ~/.claude/projects/<cwd-with-dashes>/ convention; falls back to
    cwd-matching across project dirs when the canonical path isn't there.
  * Picks the newest *.jsonl by mtime as the active session transcript.
  * Anchors the window at step_state.json mtime (the orchestrator
    rewrites that file on every advance, so it's a clean per-subtask
    transition signal). Override with --since-ts for arbitrary windows.
  * Sums message.usage.{input,output,cache_creation,cache_read}_tokens
    across assistant turns with timestamp >= anchor, returning a flat
    JSON report.

Result on neuro-vlad new-road ST-004 with explicit since-ts
2026-05-23T06:00:00Z: 33 messages counted, 27265 output tokens,
331344 cache-creation, 2129820 cache-read — the kind of signal that
previously required eyeballing transcripts.

+3 test cases (TestSubtaskTokenUsage). Suite: 1465 passed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Convenience over `--since-ts 1970-01-01T00:00:00Z`: pass `--all` to
report tokens spent across the entire active session, ignoring the
default step_state.json mtime anchor that scopes the report to the
current subtask. Useful when the operator wants a running session
total rather than "since current subtask boundary".

Real smoke on neuro-vlad new-road (currently at ST-005, 58 messages
in the active jsonl): 388 223 total tokens, 4.6M cache_read — the
exact "how much have I burned this session" signal that was missing.

+1 test case. Suite: 1466 passed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…test run)

A downstream invocation of /map-efficient finished ST-004, returned a
"Pausing to report progress... re-run /map-efficient to resume at
ST-005" message, and stopped. The operator had to issue another
/map-efficient call to drive ST-005. Doubles round-trips and burns
attention; the operator explicitly asked the skill to ship the whole
plan, not check in after each subtask.

Step 2b now carries a "MANDATORY: Do NOT pause between subtasks" rule
with the four legitimate stop conditions enumerated:
  1. next_step="COMPLETE" with subtask_index+1 == len(subtask_sequence)
  2. retry-quarantine adjudication required
  3. user explicit interrupt
  4. circuit breaker tripped

Anything else is "the wrong default" the operator just called out.

+2 regression test cases (TestMapEfficientNoInterSubtaskPause).
Suite: 1470 passed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sixth round of fixes from a downstream /map-efficient run on neuro-vlad
new-road. Six framework gaps, one commit, regression-tested.

#7 transactional MONITOR pass:
  validate_step("2.4") now implicitly closes pending 2.3 (ACTOR) when
  the cursor is mid-flight. Caller convenience — Monitor approval
  logically means Actor work was accepted, so requiring a separate
  validate_step("2.3") before validate_step("2.4") was just ceremony
  that produced "Step mismatch: expected 2.3, got 2.4" errors. Skill
  can now go straight Monitor-pass → record_subtask_result →
  validate_step("2.4").

#10 build_context_block auto-loads research:
  Inlines the latest research artifact (actor → monitor → decomposer
  kinds, first hit wins, cap 1500 chars) into the context block under
  "# Research Findings (ST-NNN, kind=actor):". Stops the manual
  "load_research → glue into Actor prompt" two-step.

#6 detect_already_done CLI:
  python3 .map/scripts/map_step_runner.py detect_already_done <branch>
    <subtask_id> [--since-ref REF]
  Heuristic check: every affected_file exists AND has commits in the
  window? Returns "likely_done" / "partial" / "unclear". Falls back
  to all-history when --since-ref doesn't resolve (fresh repos).
  Pragmatic, not authoritative — operators still review evidence
  before mark_subtask_complete.

#3 scope baseline:
  validate_mutation_boundary now subtracts a per-branch baseline
  (.map/<branch>/scope-baseline.json) from `actual`. Capture it with
  the new record_scope_baseline CLI when the branch carries
  pre-existing untracked / unstaged work from prior waves; subsequent
  mutation-boundary checks then only flag files the current subtask
  actually changed. Closes the "every ST shows warning because the
  branch is dirty" friction.

#4 verification-command REQUIRED suppression:
  workflow-context-injector now recognizes verification invocations
  (pytest, ruff check, ruff format --check, mypy, pyright, go vet/
  build, cargo check, tsc --noEmit) and emits the base reminder
  WITHOUT the trailing " | REQUIRED: Run Actor" pressure tag. Actor
  running pytest on their own work shouldn't be nagged to re-enter
  the phase they're already in.

#9 WAVE banner only when wave loop is active:
  workflow-context-injector no longer surfaces "WAVE 1/N" while the
  sequential walker (get_next_step) drives — only when
  current_wave_index > 0 (wave loop actually advanced). Removes the
  "[waves computed, sequential walker active]" cognitive-noise tail
  the operator just called out.

+9 new test cases across orchestrator and step_runner suites.
Suite: 1476 passed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants