Skip to content

Commit a991d64

Browse files
committed
docs: add touchSet enforcement tests to manual test plan
Adds Phase 5G with 47 tests covering touchSet violation detection, accept path, relaunch path, retry with re-validation, mutual exclusion, and non-touchset relaunch rejection. Updates job states reference, state transition maps, coverage matrix, results tracking (276 → 323 tests), quick smoke test, and key risks.
1 parent 84e5a7f commit a991d64

1 file changed

Lines changed: 128 additions & 17 deletions

File tree

MANUAL_TEST_PLAN.md

Lines changed: 128 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -132,10 +132,11 @@ Run these tests for basic validation after a code change. References use test ID
132132
| 6 | Phase 5 | 5.7-5.10 (plan with deps → verify waiting_deps → cancel) | Plan basics |
133133
| 7 | Phase 9 | 9.1 (overview empty) | Dashboard baseline |
134134
| 8 | Phase 9 | 9.4 (overview with jobs) | Dashboard with data |
135-
| 9 | Phase 9 | 9.14 (overview after cleanup) | Dashboard cleanup |
136-
| 10 | Phase 12 | Nuclear Cleanup | Clean exit |
135+
| 9 | Phase 5G | 5.76 (retry + relaunch mutual exclusion) | TouchSet param validation |
136+
| 10 | Phase 9 | 9.14 (overview after cleanup) | Dashboard cleanup |
137+
| 11 | Phase 12 | Nuclear Cleanup | Clean exit |
137138

138-
**Pass criteria**: All 10 steps succeed. If any fail, run the full test plan for that phase.
139+
**Pass criteria**: All 11 steps succeed. If any fail, run the full test plan for that phase.
139140

140141
---
141142

@@ -159,7 +160,7 @@ All 17 tools must be exercised during this plan. Check off as tested:
159160
| `mc_plan` | 5, 6 | Create orchestrated plans |
160161
| `mc_plan_status` | 5, 6 | Plan progress |
161162
| `mc_plan_cancel` | 5, 6 | Cancel active plan |
162-
| `mc_plan_approve` | 5 | Approve copilot/supervisor |
163+
| `mc_plan_approve` | 5, 5G | Approve copilot/supervisor, accept/relaunch/retry touchSet violations |
163164
| `mc_report` | 8 | Agent status reporting (filesystem verification) |
164165
| `mc_overview` | 9 | Dashboard summary |
165166

@@ -175,7 +176,7 @@ From `plan-types.ts`:
175176
| `waiting_deps` | Waiting for dependencies to merge | Phase 5, 6 |
176177
| `running` | Agent is actively working | Phase 1, 3, 6 |
177178
| `completed` | Agent finished successfully | Phase 1 (if agent completes), 6 |
178-
| `failed` | Agent crashed or exited non-zero | Phase 11 (observational) |
179+
| `failed` | Agent crashed, exited non-zero, or touchSet violation | Phase 5G, 11 (observational) |
179180
| `ready_to_merge` | Completed and queued for merge train | Phase 6 (plan context) |
180181
| `merging` | Currently being merged into integration | Phase 6 (plan context) |
181182
| `merged` | Successfully merged into integration | Phase 6 (plan context) |
@@ -480,14 +481,123 @@ This phase tests the git integration tools on a job with real commits.
480481
| 5.29 | Check for checkpoint pauses | `mc_plan_status` | May show `paused` at checkpoint or `running` |
481482
| 5.30 | Approve checkpoint (if paused) | `mc_plan_approve` checkpoint=pre_merge | Execution continues |
482483

484+
### 5G: TouchSet Enforcement
485+
486+
This section tests the touchSet violation detection and the three resolution paths:
487+
**accept**, **relaunch**, and **retry**. These require a plan with `touchSet` configured
488+
and a job that deliberately modifies files outside its allowed patterns.
489+
490+
> **Key Concept**: TouchSet validation runs after a job completes but before it enters the
491+
> merge train. If violations are found, the plan pauses at an `on_error` checkpoint with
492+
> structured `checkpointContext` containing `failureKind`, `jobName`, `touchSetViolations`,
493+
> and `touchSetPatterns`.
494+
495+
#### 5G-1: TouchSet Violation Detection
496+
497+
| # | Test | Action | Expected |
498+
|---|------|--------|----------|
499+
| 5.35 | Cleanup from prior tests | `mc_cleanup` all=true, deleteBranch=true | Clean |
500+
| 5.36 | Create plan with touchSet | `mc_plan` name=tmc-plan-touch, mode=supervisor, jobs=[{name: "tmc-ts1", prompt: "Create allowed.txt with 'hello' and also create rogue.txt with 'oops'", touchSet: ["allowed.txt"]}] | Plan created, job launches |
501+
| 5.37 | **Wait 3-5 seconds** |||
502+
| 5.38 | Verify job running | `mc_plan_status` | tmc-ts1=`running` |
503+
| 5.39 | Kill job to simulate completion | `mc_kill` name=tmc-ts1 | Stopped |
504+
| 5.40 | Get worktree path | `mc_status` name=tmc-ts1 | Extract worktree path |
505+
| 5.41 | Create violating files in worktree | In worktree: `echo 'hello' > allowed.txt && echo 'oops' > rogue.txt && git add . && git commit -m "add files"` | Commit with both files |
506+
| 5.42 | **Simulate completion**: Set job status to `completed` via state file edit — update `jobs.json` entry for tmc-ts1 to `status: "completed"` | Job appears completed |
507+
| 5.43 | **Wait 15 seconds** | Orchestrator reconciler detects completion and runs touchSet validation ||
508+
| 5.44 | Verify plan paused | `mc_plan_status` | Plan `paused`, checkpoint=`on_error` |
509+
| 5.45 | Verify checkpoint context | Read `plan.json` from state dir | `checkpointContext.failureKind` = `"touchset"`, `checkpointContext.jobName` = `"tmc-ts1"`, `checkpointContext.touchSetViolations` includes `"rogue.txt"`, `checkpointContext.touchSetPatterns` = `["allowed.txt"]` |
510+
| 5.46 | Verify job marked failed | `mc_jobs` | tmc-ts1 shows as `failed` |
511+
512+
> **Note**: Steps 5.42-5.43 are synthetic — we manually set the job to `completed` to trigger
513+
> the orchestrator's touchSet validation. In production, the monitor detects agent completion
514+
> and transitions the job state automatically.
515+
516+
#### 5G-2: Accept Path (Clear Checkpoint)
517+
518+
Continue from 5G-1 state (plan paused with touchSet violation).
519+
520+
| # | Test | Action | Expected |
521+
|---|------|--------|----------|
522+
| 5.47 | Accept violations | `mc_plan_approve` checkpoint=on_error | Checkpoint cleared, job moves to `ready_to_merge` |
523+
| 5.48 | Verify plan resumed | `mc_plan_status` | Plan `running` (or `merging` if merge train started) |
524+
| 5.49 | Verify job state | `mc_jobs` | tmc-ts1 = `ready_to_merge` or `merging` or `merged` |
525+
526+
> **Cancel immediately after verifying** — do NOT let the plan reach `creating_pr`.
527+
528+
| # | Action | Verify |
529+
|---|--------|--------|
530+
| 5.50 | `mc_plan_cancel` | Plan cancelled |
531+
| 5.51 | `mc_cleanup` all=true, deleteBranch=true | Cleaned |
532+
533+
#### 5G-3: Relaunch Path (Agent Correction)
534+
535+
This tests spawning a new agent in the existing worktree to fix violations.
536+
537+
| # | Test | Action | Expected |
538+
|---|------|--------|----------|
539+
| 5.52 | Create plan with touchSet | `mc_plan` name=tmc-plan-relaunch, mode=supervisor, jobs=[{name: "tmc-rl1", prompt: "Create allowed.txt with 'hello'", touchSet: ["allowed.txt"]}] | Plan created |
540+
| 5.53 | **Wait 3-5 seconds** |||
541+
| 5.54 | Kill job, create violation, set completed | Same as steps 5.39-5.42 for tmc-rl1 | Job appears completed with rogue.txt |
542+
| 5.55 | **Wait 15 seconds** | Orchestrator detects and validates ||
543+
| 5.56 | Verify plan paused | `mc_plan_status` | Paused, checkpoint=`on_error` |
544+
| 5.57 | Relaunch agent | `mc_plan_approve` checkpoint=on_error, relaunch=tmc-rl1 | New tmux session created in existing worktree, job back to `running` |
545+
| 5.58 | **Wait 3-5 seconds** |||
546+
| 5.59 | Verify new tmux session | `tmux list-sessions \| grep mc-tmc-rl1` | Session exists |
547+
| 5.60 | Verify job running | `mc_jobs` | tmc-rl1 = `running` |
548+
| 5.61 | Verify correction prompt | `mc_capture` name=tmc-rl1, lines=50 | Agent output visible — correction prompt includes violation details |
549+
550+
> **Cancel immediately** — the relaunched agent may or may not fix the violations.
551+
552+
| # | Action | Verify |
553+
|---|--------|--------|
554+
| 5.62 | `mc_plan_cancel` | Plan cancelled |
555+
| 5.63 | `mc_cleanup` all=true, deleteBranch=true | Cleaned |
556+
557+
#### 5G-4: Retry Path (Manual Fix + Re-validation)
558+
559+
This tests manually fixing the branch and having MC re-validate.
560+
561+
| # | Test | Action | Expected |
562+
|---|------|--------|----------|
563+
| 5.64 | Create plan with touchSet | `mc_plan` name=tmc-plan-retry, mode=supervisor, jobs=[{name: "tmc-rt1", prompt: "Create allowed.txt with 'hello'", touchSet: ["allowed.txt"]}] | Plan created |
564+
| 5.65 | **Wait 3-5 seconds** |||
565+
| 5.66 | Kill job, create violation, set completed | Same as steps 5.39-5.42 for tmc-rt1 | Job appears completed with rogue.txt |
566+
| 5.67 | **Wait 15 seconds** | Orchestrator detects and validates ||
567+
| 5.68 | Verify plan paused | `mc_plan_status` | Paused, checkpoint=`on_error` |
568+
| 5.69 | Retry WITHOUT fixing (should fail) | `mc_plan_approve` checkpoint=on_error, retry=tmc-rt1 | Error: touchSet still violated (rogue.txt still present) |
569+
| 5.70 | Verify plan still paused | `mc_plan_status` | Still paused, checkpoint=`on_error` |
570+
| 5.71 | Fix violation manually | In worktree: `git rm rogue.txt && git commit -m "remove rogue file"` | rogue.txt removed |
571+
| 5.72 | Retry after fix (should succeed) | `mc_plan_approve` checkpoint=on_error, retry=tmc-rt1 | TouchSet re-validated, job moves to `ready_to_merge` |
572+
| 5.73 | Verify plan resumed | `mc_plan_status` | Plan running |
573+
574+
> **Cancel immediately**.
575+
576+
| # | Action | Verify |
577+
|---|--------|--------|
578+
| 5.74 | `mc_plan_cancel` | Plan cancelled |
579+
| 5.75 | `mc_cleanup` all=true, deleteBranch=true | Cleaned |
580+
581+
#### 5G-5: Mutual Exclusion (retry vs relaunch)
582+
583+
| # | Test | Action | Expected |
584+
|---|------|--------|----------|
585+
| 5.76 | Both retry and relaunch | `mc_plan_approve` checkpoint=on_error, retry=tmc-x, relaunch=tmc-x | Error: cannot specify both retry and relaunch |
586+
587+
#### 5G-6: Relaunch Non-TouchSet Job Rejected
588+
589+
| # | Test | Action | Expected |
590+
|---|------|--------|----------|
591+
| 5.77 | Relaunch on non-touchset failure | (If a plan is paused with a non-touchset failure) `mc_plan_approve` checkpoint=on_error, relaunch=jobname | Error: relaunch only available for touchSet violations |
592+
483593
### Phase 5 Cleanup
484594

485595
| # | Action | Verify |
486596
|---|--------|--------|
487-
| 5.31 | `mc_plan_cancel` (if still active) | Plan cancelled |
488-
| 5.32 | `mc_cleanup` all=true, deleteBranch=true | All plan artifacts cleaned |
489-
| 5.33 | `mc_jobs` — verify empty | "No jobs found." |
490-
| 5.34 | `mc_plan_status` — verify no plan | "No active plan" |
597+
| 5.78 | `mc_plan_cancel` (if still active) | Plan cancelled |
598+
| 5.79 | `mc_cleanup` all=true, deleteBranch=true | All plan artifacts cleaned |
599+
| 5.80 | `mc_jobs` — verify empty | "No jobs found." |
600+
| 5.81 | `mc_plan_status` — verify no plan | "No active plan" |
491601

492602
---
493603

@@ -963,15 +1073,15 @@ rm -f "$STATE_DIR/state/jobs.json" 2>/dev/null || true
9631073
| 2 | Error handling & edge cases | 28 | | | | +4 window placement, +11 post-create hooks |
9641074
| 3 | Multiple jobs | 14 | | | | +3 status filter tests |
9651075
| 4 | Git workflow (sync & merge) | 18 | | | | |
966-
| 5 | Plan orchestration | 34 | | | | |
1076+
| 5 | Plan orchestration | 81 | | | | +47 touchSet enforcement (5G: detect, accept, relaunch, retry, mutual exclusion) |
9671077
| 6 | Realistic multi-job (overlap/conflict) | 49 | | | | |
9681078
| 7 | Model verification | 12 | | | | +3 model ID, prompt file, model match |
9691079
| 8 | mc_report flow | 54 | | | | +17 deterministic injection (replaced 21 non-deterministic) |
9701080
| 9 | mc_overview dashboard | 16 | | | | |
9711081
| 10 | OMO plan mode | 10 | | | | |
9721082
| 11 | Hooks (observational) | 5 | | | | |
9731083
| 12 | Final verification & nuclear cleanup | 8 | | | | |
974-
| **Total** | | **276** | | | | |
1084+
| **Total** | | **323** | | | | |
9751085

9761086
---
9771087

@@ -987,6 +1097,7 @@ rm -f "$STATE_DIR/state/jobs.json" 2>/dev/null || true
9871097
8. **Report reliability**: Agents have `mc_report` available (plugin loaded via `.opencode` symlink) and are instructed to call it via `MC_REPORT_SUFFIX` prompt injection. Report files should appear reliably, but agent behavior is ultimately non-deterministic — a missing report after 15 seconds warrants investigation but is not necessarily a plugin failure.
9881098
9. **Launcher script timing**: `.mc-launch.sh` is deleted after 5 seconds. Phase 7 must read it immediately after launch. If you miss the window, the test is inconclusive, not failed.
9891099
10. **Worktree initialization race**: Some operations may fail if attempted before the worktree is fully initialized. The 3-5 second wait after every `mc_launch` mitigates this.
1100+
11. **TouchSet testing on feature branches**: When running Phase 5G on a non-main branch, job worktrees inherit the feature branch's uncommitted changes. TouchSet validation compares the job branch against the integration branch, so feature branch source files show up as spurious violations alongside the actual test violations (e.g., `rogue.txt`). This is a testing artifact — in production, both branches share the same base so only the job's own changes appear.
9901101

9911102
---
9921103

@@ -1043,12 +1154,12 @@ This is mitigated by:
10431154

10441155
```
10451156
queued ──────> waiting_deps ──> running ──> completed ──> ready_to_merge ──> merging ──> merged
1046-
│ │ │ │
1047-
│ │ ├──> failed ├──> conflict ──> ready_to_merge
1048-
│ │ │ │
1049-
│ │ ├──> stopped └──> (canceled/stopped)
1050-
│ │ │
1051-
│ │ └──> canceled
1157+
│ │ │
1158+
│ │ ├──> failed └──> failed (touchSet) ├──> conflict ──> ready_to_merge
1159+
│ │ │
1160+
│ │ ├──> stopped ├──> ready_to_merge (accept) └──> (canceled/stopped)
1161+
│ │ │ ├──> running (relaunch)
1162+
│ │ └──> canceled └──> ready_to_merge (retry)
10521163
│ │
10531164
│ ├──> stopped
10541165
│ └──> canceled

0 commit comments

Comments
 (0)