Skip to content

Compact high-traffic MAP skill playbooks#138

Merged
azalio merged 5 commits into
mainfrom
codex/2604-033-2-compact-playbooks
May 20, 2026
Merged

Compact high-traffic MAP skill playbooks#138
azalio merged 5 commits into
mainfrom
codex/2604-033-2-compact-playbooks

Conversation

@azalio
Copy link
Copy Markdown
Owner

@azalio azalio commented May 20, 2026

Summary

  • Compact /map-plan, /map-efficient, /map-check, and /map-review active SKILL.md bodies under 500 lines
  • Move low-frequency examples, troubleshooting, rationale, command matrices, wave details, and section rubrics into bundled supporting files
  • Add regression coverage, docs, improvement-plan bookkeeping, and learned rules for high-traffic skill compactness

User impact

The main MAP golden-path skills now carry less recurring context after invocation and compaction while preserving mandatory execution gates and full supporting references for edge cases.

Validation

  • pytest tests/test_skills.py tests/test_template_sync.py -v
  • make lint
  • pytest -m "not slow"
  • uv run --no-sync mapify init --no-git --mcp none, then inspected installed compact skills and supporting refs
  • PYTHONPATH=src python -m mapify_cli.skill_ir src/mapify_cli/templates/skills src/mapify_cli/templates/codex/skills --format json

Copilot AI review requested due to automatic review settings May 20, 2026 12:29
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR compacts the highest-traffic MAP workflow skill playbooks (/map-plan, /map-efficient, /map-check, /map-review) to reduce retained context after invocation/compaction, while preserving detailed guidance in bundled supporting reference Markdown files. It also adds regression coverage and documentation to keep these skills compact over time.

Changes:

  • Added a regression test enforcing high-traffic SKILL.md bodies stay ≤500 lines and link to bundled supporting references (in both .claude/skills and shipped templates).
  • Refactored multiple skills to move low-frequency examples/troubleshooting/rubrics into new *-reference.md supporting files.
  • Updated maintainer docs/improvement bookkeeping to document the compact-skill lifecycle and validation workflow.

Reviewed changes

Copilot reviewed 27 out of 27 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
tests/test_skills.py Adds regression enforcing compactness + supporting-reference linkage for high-traffic skills.
src/mapify_cli/templates/skills/map-review/SKILL.md Compacts /map-review active playbook and links to review-reference.md.
src/mapify_cli/templates/skills/map-review/review-reference.md Adds /map-review supporting reference (rubrics/examples/troubleshooting).
src/mapify_cli/templates/skills/map-plan/plan-reference.md Adds /map-plan supporting reference (spec template/examples/troubleshooting).
src/mapify_cli/templates/skills/map-efficient/SKILL.md Compacts /map-efficient active playbook and links to efficient-reference.md.
src/mapify_cli/templates/skills/map-efficient/efficient-reference.md Adds /map-efficient supporting reference (waves/TDD/examples/troubleshooting).
src/mapify_cli/templates/skills/map-check/SKILL.md Compacts /map-check playbook and links to check-reference.md.
src/mapify_cli/templates/skills/map-check/check-reference.md Adds /map-check supporting reference (matrices/examples/troubleshooting).
README.md Documents the “compact high-traffic playbooks” philosophy.
docs/USAGE.md Adds maintainer guardrail: keep high-traffic skills under 500 lines + supporting refs.
docs/learned/testing-strategies.md Adds learned note about smoking compact skills in generated projects.
docs/learned/review-checks.md Adds review check for verifying mandatory workflow markers remain in active SKILL.md.
docs/learned/commands.md Documents the new focused compact-skill regression command.
docs/learned/architecture-patterns.md Adds architecture note reinforcing compact high-traffic skills.
docs/improvement-plan.md Updates improvement-plan bookkeeping to mark this slice as shipped and remove duplicated section.
docs/improvement-loop-log.md Logs the improvement-loop decision and validation steps for this compaction slice.
docs/improvement-done.md Records completion summary and artifacts moved into supporting references.
docs/ARCHITECTURE.md Documents “Compact Skill Playbooks” as an architectural principle.
.claude/skills/map-review/SKILL.md Mirrors /map-review compaction in source-of-truth .claude skills.
.claude/skills/map-review/review-reference.md Mirrors /map-review supporting reference in .claude.
.claude/skills/map-plan/SKILL.md Mirrors /map-plan compaction in .claude and links to plan-reference.md.
.claude/skills/map-plan/plan-reference.md Mirrors /map-plan supporting reference in .claude.
.claude/skills/map-efficient/SKILL.md Mirrors /map-efficient compaction in .claude.
.claude/skills/map-efficient/efficient-reference.md Mirrors /map-efficient supporting reference in .claude.
.claude/skills/map-check/SKILL.md Mirrors /map-check compaction in .claude.
.claude/skills/map-check/check-reference.md Mirrors /map-check supporting reference in .claude.
Comments suppressed due to low confidence (2)

src/mapify_cli/templates/skills/map-check/SKILL.md:185

  • The diagnostics snippet references $TEST_CMD, but it is never set in this skill anymore, and the snippet no longer shows re-running the failing command to produce $LOG_FILE before parsing. Define TEST_CMD (or use the actual command string inline), run it with output redirected to $LOG_FILE, then pass the real exit code to diagnostics.py parse/summarize.
```bash
python3 .map/scripts/diagnostics.py summarize \
  --tool tests \
  --command "$TEST_CMD" \
  --exit-code $? \
  --summary "Pytest run for branch verification" \
  --known-issues ".map/${BRANCH}/known-issues.json" \
  --notes "Capture deviations, flaky behavior, or environment quirks here"

.claude/skills/map-check/SKILL.md:185

  • The diagnostics snippet references $TEST_CMD, but it is never set in this skill anymore, and the snippet no longer shows re-running the failing command to produce $LOG_FILE before parsing. Define TEST_CMD (or use the actual command string inline), run it with output redirected to $LOG_FILE, then pass the real exit code to diagnostics.py parse/summarize.
```bash
python3 .map/scripts/diagnostics.py summarize \
  --tool tests \
  --command "$TEST_CMD" \
  --exit-code $? \
  --summary "Pytest run for branch verification" \
  --known-issues ".map/${BRANCH}/known-issues.json" \
  --notes "Capture deviations, flaky behavior, or environment quirks here"

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

if [ "$DETACHED_FLAG" = "true" ]; then
# EC-15: prepare detached review once; compare runs reuse the same path.
DETACHED_JSON=$(python3 .map/scripts/map_step_runner.py prepare_detached_review)
DETACHED_PATH=$(echo "$DETACHED_JSON" | python3 -c "import sys,json; print(json.load(sys.stdin).get('detached_path',''))")
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in a22ad14: the detached helper snippet now parses status, reason, and worktree_path from prepare_detached_review and passes BUNDLE_JSON_PATH.

Comment on lines +169 to +178
```bash
REVIEW_PROMPTS_JSON=$(python3 .map/scripts/map_step_runner.py build_review_prompts \
--review-preferences "[paste Review Preferences section above]")

MONITOR_PROMPT=$(printf '%s' "$REVIEW_PROMPTS_JSON" | python3 -c 'import json,sys; print(json.load(sys.stdin)["prompts"]["monitor"]["prompt"])')
PREDICTOR_PROMPT=$(printf '%s' "$REVIEW_PROMPTS_JSON" | python3 -c 'import json,sys; print(json.load(sys.stdin)["prompts"]["predictor"]["prompt"])')
EVALUATOR_PROMPT=$(printf '%s' "$REVIEW_PROMPTS_JSON" | python3 -c 'import json,sys; print(json.load(sys.stdin)["prompts"]["evaluator"]["prompt"])')
PROMPTS_JSON=$(python3 .map/scripts/map_step_runner.py build_review_prompts --review-preferences "$ARGUMENTS")
```

Use the extracted `MONITOR_PROMPT`, `PREDICTOR_PROMPT`, and `EVALUATOR_PROMPT` strings as the Task prompts.
The helper preserves the review bundle as primary context, clips lower-priority raw diff
context first when the prompt exceeds `MAP_REVIEW_PROMPT_BUDGET_TOKENS` (default 12,000
estimated tokens), clips oversized review preferences only if needed to preserve the hard
budget, and includes a `Review Prompt Budget` diagnostic document when clipping occurs.

**3 agent Task calls** (pass the budgeted prompt for each role):
Launch all three tasks from the generated prompt files. Keep reviewer task calls below the bundle and prompt-builder commands.

```
Task(
subagent_type="monitor",
description="Review code changes",
prompt="[paste MONITOR_PROMPT]"
)

Task(
subagent_type="predictor",
description="Analyze change impact",
prompt="[paste PREDICTOR_PROMPT]"
)

Task(
subagent_type="evaluator",
description="Score change quality",
prompt="[paste EVALUATOR_PROMPT]"
)
```text
Task(subagent_type="monitor", description="Review diff for correctness", prompt="Read generated monitor prompt from build_review_prompts output.")
Task(subagent_type="predictor", description="Predict integration risk", prompt="Read generated predictor prompt from build_review_prompts output.")
Task(subagent_type="evaluator", description="Score review quality", prompt="Read generated evaluator prompt from build_review_prompts output.")
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in b45c28e: build_review_prompts output is now parsed into MONITOR_PROMPT/PREDICTOR_PROMPT/EVALUATOR_PROMPT and those variables are passed to Task calls.


```bash
SECTIONS_JSON=$(python3 .map/scripts/map_step_runner.py shuffle-sections "$MODE_FLAG" "$SEED_RAW")
SECTIONS_JSON=$(python3 .map/scripts/map_step_runner.py shuffle-sections --mode "$MODE_FLAG" --seed "$SEED_RAW")
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in b45c28e: shuffle-sections now uses positional arguments again.

Comment thread .claude/skills/map-review/SKILL.md Outdated
if [ "$DETACHED_FLAG" = "true" ]; then
# EC-15: prepare detached review once; compare runs reuse the same path.
DETACHED_JSON=$(python3 .map/scripts/map_step_runner.py prepare_detached_review)
DETACHED_PATH=$(echo "$DETACHED_JSON" | python3 -c "import sys,json; print(json.load(sys.stdin).get('detached_path',''))")
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in a22ad14 in the .claude source copy; templates were resynced.

Comment thread .claude/skills/map-review/SKILL.md Outdated
Comment on lines +169 to +178
```bash
REVIEW_PROMPTS_JSON=$(python3 .map/scripts/map_step_runner.py build_review_prompts \
--review-preferences "[paste Review Preferences section above]")

MONITOR_PROMPT=$(printf '%s' "$REVIEW_PROMPTS_JSON" | python3 -c 'import json,sys; print(json.load(sys.stdin)["prompts"]["monitor"]["prompt"])')
PREDICTOR_PROMPT=$(printf '%s' "$REVIEW_PROMPTS_JSON" | python3 -c 'import json,sys; print(json.load(sys.stdin)["prompts"]["predictor"]["prompt"])')
EVALUATOR_PROMPT=$(printf '%s' "$REVIEW_PROMPTS_JSON" | python3 -c 'import json,sys; print(json.load(sys.stdin)["prompts"]["evaluator"]["prompt"])')
PROMPTS_JSON=$(python3 .map/scripts/map_step_runner.py build_review_prompts --review-preferences "$ARGUMENTS")
```

Use the extracted `MONITOR_PROMPT`, `PREDICTOR_PROMPT`, and `EVALUATOR_PROMPT` strings as the Task prompts.
The helper preserves the review bundle as primary context, clips lower-priority raw diff
context first when the prompt exceeds `MAP_REVIEW_PROMPT_BUDGET_TOKENS` (default 12,000
estimated tokens), clips oversized review preferences only if needed to preserve the hard
budget, and includes a `Review Prompt Budget` diagnostic document when clipping occurs.

**3 agent Task calls** (pass the budgeted prompt for each role):
Launch all three tasks from the generated prompt files. Keep reviewer task calls below the bundle and prompt-builder commands.

```
Task(
subagent_type="monitor",
description="Review code changes",
prompt="[paste MONITOR_PROMPT]"
)

Task(
subagent_type="predictor",
description="Analyze change impact",
prompt="[paste PREDICTOR_PROMPT]"
)

Task(
subagent_type="evaluator",
description="Score change quality",
prompt="[paste EVALUATOR_PROMPT]"
)
```text
Task(subagent_type="monitor", description="Review diff for correctness", prompt="Read generated monitor prompt from build_review_prompts output.")
Task(subagent_type="predictor", description="Predict integration risk", prompt="Read generated predictor prompt from build_review_prompts output.")
Task(subagent_type="evaluator", description="Score review quality", prompt="Read generated evaluator prompt from build_review_prompts output.")
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in b45c28e in the .claude source copy; templates were resynced.

Comment thread .claude/skills/map-review/SKILL.md Outdated

```bash
SECTIONS_JSON=$(python3 .map/scripts/map_step_runner.py shuffle-sections "$MODE_FLAG" "$SEED_RAW")
SECTIONS_JSON=$(python3 .map/scripts/map_step_runner.py shuffle-sections --mode "$MODE_FLAG" --seed "$SEED_RAW")
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in b45c28e in the .claude source copy; templates were resynced.

# /map-check - Quality Gates & Verification

**Purpose:** Run code quality checks (linters, type checkers, tests) and/or verify MAP workflow completion.
Purpose: run verification only. Do not plan, implement, or fix from this skill.
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in b45c28e: purpose text now says quality gates and MAP workflow verification.

Comment on lines +161 to 167
Optional structured diagnostics for failing gates:
```bash
TEST_CMD="pytest" # Default; override if project uses different test runner
echo "Running final tests..."
eval "$TEST_CMD"

# Optional (structured diagnostics):
# If tests fail and you want a durable artifact for follow-up/debugging,
# re-run capturing output and parse to .map/<branch>/diagnostics.json:
#
# BRANCH=$(git rev-parse --abbrev-ref HEAD | sed -E 's|/|-|g; s|[^a-zA-Z0-9_.-]|-|g; s|-{2,}|-|g; s|^-||; s|-$||')
# LOG_FILE=".map/${BRANCH}/tests.log"
# mkdir -p ".map/${BRANCH}"
# ( $TEST_CMD ) >"$LOG_FILE" 2>&1
# python3 .map/scripts/diagnostics.py parse --tool tests --log "$LOG_FILE" --command "$TEST_CMD" --exit-code $?

if [[ $? -ne 0 ]]; then
echo "❌ Tests failed - verification REJECTED"
VERDICT="REJECTED"
REASON="Test suite has failures"
fi
LOG_FILE=".map/${BRANCH}/tests.log"
mkdir -p ".map/${BRANCH}"
# Re-run the failing command into $LOG_FILE, then parse it:
python3 .map/scripts/diagnostics.py parse --tool tests --log "$LOG_FILE" --command "$TEST_CMD" --exit-code $?
```
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in b45c28e: the diagnostics snippet defines TEST_CMD, reruns it into LOG_FILE, captures TEST_EXIT, and passes that exit code to parse/summarize.

Comment thread .claude/skills/map-check/SKILL.md Outdated
# /map-check - Quality Gates & Verification

**Purpose:** Run code quality checks (linters, type checkers, tests) and/or verify MAP workflow completion.
Purpose: run verification only. Do not plan, implement, or fix from this skill.
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in b45c28e in the .claude source copy; templates were resynced.

Comment on lines +161 to 167
Optional structured diagnostics for failing gates:
```bash
TEST_CMD="pytest" # Default; override if project uses different test runner
echo "Running final tests..."
eval "$TEST_CMD"

# Optional (structured diagnostics):
# If tests fail and you want a durable artifact for follow-up/debugging,
# re-run capturing output and parse to .map/<branch>/diagnostics.json:
#
# BRANCH=$(git rev-parse --abbrev-ref HEAD | sed -E 's|/|-|g; s|[^a-zA-Z0-9_.-]|-|g; s|-{2,}|-|g; s|^-||; s|-$||')
# LOG_FILE=".map/${BRANCH}/tests.log"
# mkdir -p ".map/${BRANCH}"
# ( $TEST_CMD ) >"$LOG_FILE" 2>&1
# python3 .map/scripts/diagnostics.py parse --tool tests --log "$LOG_FILE" --command "$TEST_CMD" --exit-code $?

if [[ $? -ne 0 ]]; then
echo "❌ Tests failed - verification REJECTED"
VERDICT="REJECTED"
REASON="Test suite has failures"
fi
LOG_FILE=".map/${BRANCH}/tests.log"
mkdir -p ".map/${BRANCH}"
# Re-run the failing command into $LOG_FILE, then parse it:
python3 .map/scripts/diagnostics.py parse --tool tests --log "$LOG_FILE" --command "$TEST_CMD" --exit-code $?
```
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in b45c28e in the .claude source copy; templates were resynced.

@azalio azalio merged commit c0a6494 into main May 20, 2026
6 checks passed
@azalio azalio deleted the codex/2604-033-2-compact-playbooks branch May 20, 2026 13:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants