Address MAP framework bundle: 8 framework gaps surfaced in a downstream run#142
Open
azalio wants to merge 10 commits into
Open
Address MAP framework bundle: 8 framework gaps surfaced in a downstream run#142azalio wants to merge 10 commits into
azalio wants to merge 10 commits into
Conversation
Eight inter-related framework issues surfaced during a downstream run. All fixes ship together with regression tests and template sync. map-check/SKILL.md (#7-bundle bug) Step 2 indexed `pending_steps["ST-001"]` but the canonical schema makes pending_steps a flat list[str] of workflow phase ids — jq crashed with `Cannot index array with string`. Rewrote Step 2 around workflow_status + flat-array iteration. get_next_step short-circuit (#7) Added early-return on workflow_status=='WORKFLOW_COMPLETE' so a stale repopulation of pending_steps after a finished run no longer surfaces a phantom RESEARCH step. build_context_block CLI surfacing (#6) map_step_runner.py already exposed the CLI subcommand; skill docs still pushed `python -c "import sys; sys.path.insert..."`. Replaced with the canonical CLI invocation + bash recipe. save_research / load_research API (#12) New subtask-scoped artifact API in map_step_runner.py (function + CLI subcommands) with strict sanitization. Storage lands at .map/<branch>/research/<subtask_id>__<kind>.md, partitioned by kind (actor / monitor / decomposer). map-efficient RESEARCH phase rewired to use it. peek_current_step (#2) Read-only recovery escape hatch for "Step mismatch: expected Y, got X" after validate_step double-advance. Returns the same shape as get_next_step but never saves the state. mark_subtask_complete (#3) CLI subcommand on the orchestrator to short-circuit already-done / no-op subtasks without the research→actor→monitor cycle. Records a synthetic subtask_result with status='no-op' for audit, advances the cursor, and closes the workflow atomically when it was the last subtask. Skill prompt updated with the new path. validate_mutation_boundary (#11) New CLI in map_step_runner.py compares the actual git diff vs blueprint.subtasks[id].affected_files. Warn-only default (appends to .map/<branch>/scope-violations.log); MAP_STRICT_SCOPE=1 escalates to hard reject. .map/ and .codex/ paths are excluded from the actual surface — they are framework infrastructure, not subtask scope. Monitor agent prompt now runs it during the verification sequence. Wave-planner over-serialization guidance (#4) Audit identified the root cause as decomposer-side false dependencies (linear deps collapse the wave planner to single-subtask waves). Added "Minimize Dependencies for Parallelism" section to task-decomposer.md + new checklist items requiring each edge be load-bearing and affected_files always populated. context-meter (#13) Already implemented in .claude/hooks/context-meter.py — closed as resolved with documentation in TaskUpdate. Side fix: end-of-turn.sh hook used `py_compile`, which writes __pycache__/*.pyc next to source even with -B (emitting bytecode is its entire job). Replaced with `ast.parse` so editing any .py module under src/mapify_cli/templates/ no longer trips the template-hygiene gate. Same change in the Monitor agent's syntax-check recommendation (monitor.md + monitor.toml). Code hygiene cleanups along the way: - Removed unused `state: StepState` param from _write_retry_quarantine. - pyright: ignore on the dynamic DependencyGraph / SubtaskNode imports (importlib spec fallback Pyright cannot follow). - Three pre-existing tmp_path unused fixture params in test_map_orchestrator.py got the documented `del` suppression. Tests: +44 new test cases. Full suite 1437 passed / 4 skipped. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Pull request overview
Bundles several MAP framework hardening fixes surfaced by downstream usage: adds new orchestration/step-runner surfaces, wires research artifacts into /map-efficient, strengthens mutation-boundary verification, and updates hook/agent guidance to avoid template hygiene regressions.
Changes:
- Added orchestrator recovery + workflow helpers (
peek_current_step,mark_subtask_complete) and fixedget_next_stepcompletion short-circuit. - Added step-runner research artifact API (
save_research/load_research) and a git-diff-based mutation boundary validator with warn/strict modes. - Updated skills/agents/hooks and added regression tests (including replacing
py_compilewithast.parseto prevent__pycache__pollution).
Reviewed changes
Copilot reviewed 20 out of 20 changed files in this pull request and generated 16 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/test_skills.py | Adds regression tests ensuring updated decomposer/skills guidance stays present and schema-correct. |
| tests/test_map_step_runner.py | Adds tests for mutation-boundary validation and save/load research CLI behavior. |
| tests/test_map_orchestrator.py | Adds regression tests for new orchestrator helpers and completion short-circuit behavior. |
| tests/hooks/test_end_of_turn.py | Adds regression tests ensuring syntax checks don’t create __pycache__ while still catching syntax errors. |
| src/mapify_cli/templates/skills/map-efficient/SKILL.md | Documents no-op short-circuit, research artifact wiring, and build_context_block CLI usage. |
| src/mapify_cli/templates/skills/map-check/SKILL.md | Fixes jq usage to treat pending_steps as a flat array and rely on workflow_status. |
| src/mapify_cli/templates/map/scripts/map_step_runner.py | Implements save_research/load_research + validate_mutation_boundary and exposes new CLI subcommands. |
| src/mapify_cli/templates/map/scripts/map_orchestrator.py | Adds peek_current_step, mark_subtask_complete, and WORKFLOW_COMPLETE short-circuit in get_next_step. |
| src/mapify_cli/templates/hooks/end-of-turn.sh | Replaces py_compile with ast.parse to avoid writing bytecode into templates. |
| src/mapify_cli/templates/codex/agents/monitor.toml | Updates Python build-gate guidance to use ast.parse instead of py_compile. |
| src/mapify_cli/templates/agents/task-decomposer.md | Adds mandatory guidance to minimize false dependency edges and require populated affected_files. |
| src/mapify_cli/templates/agents/monitor.md | Adds mutation-boundary verification step and updates Python build-gate guidance. |
| .map/scripts/map_step_runner.py | Mirrors template step-runner updates for runtime use. |
| .map/scripts/map_orchestrator.py | Mirrors template orchestrator updates for runtime use. |
| .codex/agents/monitor.toml | Mirrors template Codex monitor guidance update. |
| .claude/skills/map-efficient/SKILL.md | Mirrors template /map-efficient skill updates. |
| .claude/skills/map-check/SKILL.md | Mirrors template /map-check skill updates. |
| .claude/hooks/end-of-turn.sh | Mirrors template hook update using ast.parse to prevent __pycache__. |
| .claude/agents/task-decomposer.md | Mirrors template task-decomposer guidance additions. |
| .claude/agents/monitor.md | Mirrors template monitor guidance additions. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+5074
to
+5078
| diff_result = subprocess.run( | ||
| ["git", "diff", "--name-only", base_ref], | ||
| cwd=project_dir, | ||
| capture_output=True, | ||
| text=True, |
Comment on lines
+5761
to
+5766
| # Exit code: 0 unless MAP_STRICT_SCOPE=1 AND status=="violation". | ||
| base_ref_arg = sys.argv[4] if len(sys.argv) >= 5 else None | ||
| report = validate_mutation_boundary(sys.argv[2], sys.argv[3], base_ref_arg) | ||
| print(json.dumps(report, indent=2)) | ||
| if report.get("status") == "violation" and report.get("strict"): | ||
| sys.exit(1) |
Comment on lines
+5026
to
+5030
| Return shape:: | ||
| { | ||
| "status": "clean" | "warning" | "violation", | ||
| "subtask_id": str, | ||
| "base_ref": str, |
| load_research(branch_arg, subtask_arg, kind=kind_arg) | ||
| ) | ||
| except ValueError as exc: | ||
| print(json.dumps({"status": "error", "message": str(exc)})) |
|
|
||
| Some subtasks are already-done historically (rename/refactor landed in a prior PR), or are docs-only and don't need the full research→actor→monitor cycle. Skip them up-front to save tokens: | ||
|
|
||
| ```bash |
Comment on lines
+5028
to
+5034
| "status": "clean" | "warning" | "violation", | ||
| "subtask_id": str, | ||
| "base_ref": str, | ||
| "expected": [str], # declared affected_files | ||
| "actual": [str], # files actually changed | ||
| "unexpected": [str], # actual but not expected (scope leak) | ||
| "strict": bool, |
| load_research(branch_arg, subtask_arg, kind=kind_arg) | ||
| ) | ||
| except ValueError as exc: | ||
| print(json.dumps({"status": "error", "message": str(exc)})) |
|
|
||
| Some subtasks are already-done historically (rename/refactor landed in a prior PR), or are docs-only and don't need the full research→actor→monitor cycle. Skip them up-front to save tokens: | ||
|
|
||
| ```bash |
Comment on lines
+56
to
+60
| 5. **Verify mutation boundary (MANDATORY):** Run | ||
| `python3 .map/scripts/map_step_runner.py validate_mutation_boundary <branch> <subtask_id>` | ||
| to compare the actual git diff against the subtask's declared `affected_files`. | ||
| - `status="clean"` → continue. | ||
| - `status="warning"` → record the `unexpected` files in your verdict; do |
| Call `research-agent` for the current subtask, then persist its concise findings via the canonical `save_research` API so Actor and Monitor consume them from the same path. Validate the phase with the orchestrator. | ||
|
|
||
| ```bash | ||
| # After research-agent returns findings in $RESEARCH_FINDINGS: |
Copilot flagged 16 comments (8 unique × dev/template copies). All fixed in
the same PR.
Functional bugs
- validate_mutation_boundary now checks return codes from `git status` and
`git diff`. `git status` non-zero ⇒ hard error (cannot silently report
"clean" outside a git repo). Caller-supplied invalid `base_ref` ⇒ hard
error. Auto-resolved base_ref that doesn't exist (fresh repo, no commits
yet) ⇒ fall through to porcelain-only and report against uncommitted
state, not error.
- CLI for validate_mutation_boundary now exits 1 on status="error" so
Monitor's mandatory gate cannot silently pass via missing blueprint /
unknown subtask / git failure.
- load_research CLI now writes its error JSON to STDERR (stdout stays
empty) so command substitution `FOO=$(... load_research ...)` is not
corrupted by error payloads.
Documentation
- validate_mutation_boundary docstring now lists the "error" return shape
so callers don't assume only clean/warning/violation.
- Monitor agent prompt now spells out the status="error" branch
(`valid: false` with returned message).
Skill snippet bugs
- Both `mark_subtask_complete` and `save_research` snippets in
map-efficient/SKILL.md now define `SUBTASK_ID=$(jq -r
'.current_subtask_id' …)` before use. Previously the snippets relied on
a variable set only in a later phase, producing an empty / wrong value
on the no-op path and on RESEARCH.
Tests / cosmetic
- test_branch_is_sanitized actually passes `feature/x` now and asserts
the result lands under `feature-x/`, not the literal subpath. The prior
version's docstring lied about what it verified.
- "each dependencies edge" → "each dependency edge" typo.
Regression tests added
- test_error_when_not_a_git_repo
- test_cli_exits_non_zero_on_error_status
- TestLoadResearchCliErrorChannel.test_invalid_subtask_id_writes_to_stderr_keeps_stdout_empty
Full suite: 1440 passed (+3) / 4 skipped. ruff + mypy clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
A downstream invocation of /map-efficient against a repo that had a
complete task_plan_<branch>.md ready for resume refused with "needs a
task description in \$TASK_ARGS" — the model skipped Step 0 and
checked $ARGUMENTS for emptiness as a stop condition.
Step 0 has always supported resume, but the contract was implicit. Made
it explicit:
- "MANDATORY: Empty \$TASK_ARGS is NOT a stop condition." Spelled out
the 3-of-3 contract: only exit when args are empty AND no
step_state.json AND no task_plan_<branch>.md.
- Step 0 now checks step_state.json BEFORE plan resume (in-flight work
wins over a stale plan-only resume that would recreate state from
INIT_STATE and lose subtask_results).
- On the empty-everything path, the skill exits with a clear "provide
a task description OR run /map-plan first" message instead of
silently doing nothing.
Regression tests: TestMapEfficientEmptyArgsResumeGuard (3 cases × 2
copies = 6 tests). Suite 1446 passed.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Same PR as the original eight fixes. These came from a second downstream review of issues #1-#12 that had not landed in the first batch. #4 (real bug, found via code read at lines 736-742): validate_step at the inter-subtask boundary (pending_steps emptied but more subtasks remain) was setting current_step_id="COMPLETE" and returning next_step="COMPLETE". get_next_step then re-advanced and handed back RESEARCH for the next subtask, so the workflow recovered — but the validate_step response had already lied, making COMPLETE indistinguishable from a true terminal state. Now emits the explicit "ADVANCE_SUBTASK" sentinel (with matching current_step_id/phase) and reserves COMPLETE for the actual terminal case. #11 validate_step idempotency: Re-running validate_step X when X is already in completed_steps is now a no-op success ({valid: True, idempotent: True}) instead of "Step mismatch: expected Y, got X". Combined with the new peek_current_step, callers can safely retry without recovery dances. #5 RESEARCH enforced (not prompt-text): validate_step("2.2") now verifies that .map/<branch>/research/ <current_subtask>__*.md exists. If not, rejects with valid=false and the exact save_research command to run. "MANDATORY RESEARCH" is now actual behaviour, not just docs. #3 resume_from_plan auto-set_waves: When blueprint.json is present, resume_from_plan now invokes set_waves itself and reports the outcome in waves_computed: "success" / "error" / "skipped". /map-efficient skill no longer needs to dispatch set_waves manually after every resume. #7 get_subtask CLI: python3 .map/scripts/map_step_runner.py get_subtask <ID> [--branch X] Hides the {flat, blueprint-wrapped} schema dichotomy so callers stop needing ad-hoc jq with two fallbacks. #10 pytest-timeout in test deps: CLAUDE.md examples reference `pytest --timeout=60` but the package was missing; added to requirements-test.txt and pyproject.toml test/dev extras. #2 wave-API integration (partial — full pivot deferred): Added documentation guidance in map-efficient/SKILL.md explaining when the sequential walker (get_next_step) vs the wave loop (get_wave_step / validate_wave_step / advance_wave) applies, and noted that resume_from_plan now auto-populates execution_waves. The deeper unification — making get_next_step itself walk by execution_waves rather than subtask_sequence — touches multiple invariant tests and is tracked as a separate follow-up plan. Test integration fix: tests/integration/test_e2e_artifact_contracts.py walked subtask phases without writing research artifacts; updated to plant .map/<branch>/research/ST-NNN__actor.md per subtask now that RESEARCH enforcement is real. +11 new test cases: TestValidateStepIdempotency, TestValidateStepInterSubtaskBoundary, TestValidateStepResearchEnforcement, TestResumeFromPlanAutoSetWaves, TestGetSubtaskCli. Suite: 1456 passed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…hment) #1: workflow-context-injector now stamps the [MAP] reminder with the hook's wall-clock UTC time AND the age of step_state.json (now - mtime), e.g. [MAP] @ 14:23:01.234Z (state +0.5s) 2.3 ACTOR | ... If the hook is reading stale state (the symptom: "[MAP] still says ACTOR after I validate_step'd to MONITOR"), the "state +Xs" delta makes it obvious — a fresh validate_step would push mtime to "now" so the next hook firing should report a small delta. Future repros can compare deltas across consecutive reminders to confirm whether it's a hook cache or a genuinely stale state file. #6: build_context_block now emits the subtask's `description` field (the long-form prose what/why from blueprint) and `risk_level`. Validated against the real neuro-vlad blueprint — ST-001's 400-char description flows into the context block instead of forcing Actor to re-open blueprint.json. Length went 21 → 22 lines but the per-line density grew substantially. Description is truncated to 480 chars to stay within the context budget. +2 new test cases (TestBuildContextBlockIncludesDescription). Suite: 1458. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Eight more fixes surfaced in a fresh /map-efficient run on neuro-vlad new-road after the earlier batches landed. #1 record_subtask_result CLI: Skill text already said "record files changed in step_state.json" but no public command existed; callers reached into Python or hoped validate_step did it implicitly. Added `python3 .map/scripts/map_orchestrator.py record_subtask_result <ID> <status> --files a.py,b.py --summary "..." --commit-sha SHA`. #2 ADVANCE_SUBTASK documented: The "ADVANCE_SUBTASK" sentinel introduced in the previous batch (#4) had no description in map-efficient/SKILL.md. Added an explicit "Phase: ADVANCE_SUBTASK (synthetic boundary)" section so callers know it's a free transition (call get_next_step again) and not a phase to execute. #3 Wave banner truthfulness: workflow-context-injector now reports "[waves computed, sequential walker active]" when execution_waves is populated but current_wave_index is still 0 (sequential walker has not been swapped for the wave loop). Previously the banner claimed "mode batch:parallel" even when nothing parallel was happening. #5 Monitor verdict contract: Added a "Verdict consistency contract (MANDATORY)" block to monitor.md: MEDIUM+ severity issues force valid:false, and any `recommendation in {"revise","block","needs_investigation"}` forbids `valid:true`. Closes the loophole where Monitor returned valid:true with recommendation:revise and the skill silently advanced. #6 build_context_block truncation marker: Added a compact "# [TRUNCATED] see .map/<branch>/token_budget.json" marker inside the budgeted text when clipping happened, replacing the prior silent loss. Token-budget aware: the marker REPLACES the existing "# Context Budget:..." footer so net token cost is zero (the contract assertion stays <= configured budget). #7 save_research attempt versioning: `save_research(..., attempt=N)` (and CLI flag `--attempt N`) now preserves a numbered snapshot at `<id>__<kind>.attempt-<N>.md` BEFORE overwriting the canonical file. Useful for clean-retry diffing after Monitor rejection. #9 mark_subtask_complete hint: get_next_step's RESEARCH (2.2) instruction now mentions both the save_research command (positive path) AND the mark_subtask_complete no-op short-circuit (escape hatch). Previously the operator had to recall the latter from efficient-reference.md. #11 finalize_plan CLI: `python3 .map/scripts/map_orchestrator.py finalize_plan` bumps artifact_manifest.stages.plan to "complete" when blueprint + task_plan are present. Closes the stage-stuck-partial trap reported on neuro-vlad new-road's manifest. #12 validate_step("2.4") auto mutation-boundary: MONITOR gate now runs validate_mutation_boundary internally for the current subtask. Warn-only by default; MAP_STRICT_SCOPE=1 escalates to a hard reject. Best-effort: missing blueprint or git failure is silently skipped so the gate stays usable in unit-test contexts. Skipped from the 12-issue list: #4 hook lag — repro now possible with timestamps from previous PR commit; awaiting fresh logs to diagnose root cause. #8 type-ignore misapplication — agent-quality, not framework. #10 per-subtask token accounting — needs new transcript-parsing infrastructure; tracked for separate plan. +7 test classes added. Suite: 1462 passed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes the "no cheap way to know how many tokens spent in current
subtask" gap from the latest framework triage.
python3 .map/scripts/map_step_runner.py subtask_token_usage <branch> \
[subtask_id] [--since-ts <ISO>]
Behaviour:
* Resolves Claude Code's per-session log dir via the canonical
~/.claude/projects/<cwd-with-dashes>/ convention; falls back to
cwd-matching across project dirs when the canonical path isn't there.
* Picks the newest *.jsonl by mtime as the active session transcript.
* Anchors the window at step_state.json mtime (the orchestrator
rewrites that file on every advance, so it's a clean per-subtask
transition signal). Override with --since-ts for arbitrary windows.
* Sums message.usage.{input,output,cache_creation,cache_read}_tokens
across assistant turns with timestamp >= anchor, returning a flat
JSON report.
Result on neuro-vlad new-road ST-004 with explicit since-ts
2026-05-23T06:00:00Z: 33 messages counted, 27265 output tokens,
331344 cache-creation, 2129820 cache-read — the kind of signal that
previously required eyeballing transcripts.
+3 test cases (TestSubtaskTokenUsage). Suite: 1465 passed.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Convenience over `--since-ts 1970-01-01T00:00:00Z`: pass `--all` to report tokens spent across the entire active session, ignoring the default step_state.json mtime anchor that scopes the report to the current subtask. Useful when the operator wants a running session total rather than "since current subtask boundary". Real smoke on neuro-vlad new-road (currently at ST-005, 58 messages in the active jsonl): 388 223 total tokens, 4.6M cache_read — the exact "how much have I burned this session" signal that was missing. +1 test case. Suite: 1466 passed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…test run) A downstream invocation of /map-efficient finished ST-004, returned a "Pausing to report progress... re-run /map-efficient to resume at ST-005" message, and stopped. The operator had to issue another /map-efficient call to drive ST-005. Doubles round-trips and burns attention; the operator explicitly asked the skill to ship the whole plan, not check in after each subtask. Step 2b now carries a "MANDATORY: Do NOT pause between subtasks" rule with the four legitimate stop conditions enumerated: 1. next_step="COMPLETE" with subtask_index+1 == len(subtask_sequence) 2. retry-quarantine adjudication required 3. user explicit interrupt 4. circuit breaker tripped Anything else is "the wrong default" the operator just called out. +2 regression test cases (TestMapEfficientNoInterSubtaskPause). Suite: 1470 passed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sixth round of fixes from a downstream /map-efficient run on neuro-vlad new-road. Six framework gaps, one commit, regression-tested. #7 transactional MONITOR pass: validate_step("2.4") now implicitly closes pending 2.3 (ACTOR) when the cursor is mid-flight. Caller convenience — Monitor approval logically means Actor work was accepted, so requiring a separate validate_step("2.3") before validate_step("2.4") was just ceremony that produced "Step mismatch: expected 2.3, got 2.4" errors. Skill can now go straight Monitor-pass → record_subtask_result → validate_step("2.4"). #10 build_context_block auto-loads research: Inlines the latest research artifact (actor → monitor → decomposer kinds, first hit wins, cap 1500 chars) into the context block under "# Research Findings (ST-NNN, kind=actor):". Stops the manual "load_research → glue into Actor prompt" two-step. #6 detect_already_done CLI: python3 .map/scripts/map_step_runner.py detect_already_done <branch> <subtask_id> [--since-ref REF] Heuristic check: every affected_file exists AND has commits in the window? Returns "likely_done" / "partial" / "unclear". Falls back to all-history when --since-ref doesn't resolve (fresh repos). Pragmatic, not authoritative — operators still review evidence before mark_subtask_complete. #3 scope baseline: validate_mutation_boundary now subtracts a per-branch baseline (.map/<branch>/scope-baseline.json) from `actual`. Capture it with the new record_scope_baseline CLI when the branch carries pre-existing untracked / unstaged work from prior waves; subsequent mutation-boundary checks then only flag files the current subtask actually changed. Closes the "every ST shows warning because the branch is dirty" friction. #4 verification-command REQUIRED suppression: workflow-context-injector now recognizes verification invocations (pytest, ruff check, ruff format --check, mypy, pyright, go vet/ build, cargo check, tsc --noEmit) and emits the base reminder WITHOUT the trailing " | REQUIRED: Run Actor" pressure tag. Actor running pytest on their own work shouldn't be nagged to re-enter the phase they're already in. #9 WAVE banner only when wave loop is active: workflow-context-injector no longer surfaces "WAVE 1/N" while the sequential walker (get_next_step) drives — only when current_wave_index > 0 (wave loop actually advanced). Removes the "[waves computed, sequential walker active]" cognitive-noise tail the operator just called out. +9 new test cases across orchestrator and step_runner suites. Suite: 1476 passed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
peek_current_step,mark_subtask_complete.save_research/load_research(subtask-scoped artifact API),validate_mutation_boundary(warn-only by default,MAP_STRICT_SCOPE=1opt-in to hard reject).map-checkStep 2 (jq schema bug),map-efficient(RESEARCH wired through new API, no-op short-circuit, build_context_block CLI),task-decomposer(Minimize Dependencies guidance to fix wave over-serialization),monitor(mutation-boundary verification step, ast.parse replaces py_compile).end-of-turn.shhook usedpy_compile, which writes__pycache__/*.pycnext to source even with-B. Replaced withast.parseso the template-hygiene gate stops tripping when any .py undersrc/mapify_cli/templates/is touched.Per-issue notes
pending_steps. Also fixed themap-check/SKILL.mdStep 2 jq pattern that crashed with "Cannot index array with string" — the schema is a flat list, not a dict.python -c "import sys; sys.path.insert..."..map/<branch>/research/<subtask_id>__<kind>.md, strict id/kind sanitization, CLI for stdin/stdout streaming.validate_stepdouble-advance.status=no-opfor audit, advances cursor, closes workflow atomically)..map/+.codex/excluded from actual surface.affected_filespopulated..claude/hooks/context-meter.py— closed as resolved.Test plan
uv run pytest -q— 1437 passed, 4 skipped (was 1398 on main; +44 new test cases this PR).make lint— ruff + mypy clean.make sync-templates— all dev/template pairs in sync.neuro-vladbranchnew-road: copy updated.claude/skills/map-check/SKILL.md(ormapify initagainst the new release) and re-run/map-checkto confirm Step 2 no longer crashes.peek_current_step,mark_subtask_complete,save_research,load_research,validate_mutation_boundary.Deferred (not in this PR)
map_step_runner.py(dict.get()typed-as-object→.pop()/[]/int()errors) and 6 intest_map_step_runner.pysurfaced during this work. They are unrelated to the eight fixes and would balloon the diff; tracked for a separate cleanup PR.🤖 Generated with Claude Code