Iron Loop is CTOC's methodology for quality software delivery. From ideation to working implementation, every feature follows 16 steps.
You do not need to memorize this. CTO Chief guides you through each step automatically. This document is a reference for when you want to understand why CTO Chief asks a particular question, or when you want to customize the process.
AI coding assistants write code fast. Too fast. Without discipline, they produce code that solves the wrong problem, breaks existing features, ships vulnerabilities, and is unmaintainable.
Iron Loop is the discipline. It forces AI to plan before coding, test before implementing, and verify before shipping. The methodology is enforced by hooks, not honor — if an agent tries to write code before planning, the hook blocks it; if it skips verification, the quality gate fails.
Three checkpoints give the human final authority:
- You approve what to build (after functional planning)
- You approve how to build it (after technical planning)
- You approve the result (after implementation and verification)
Nothing ships without your explicit approval.
| Step | Name | One-Liner |
|---|---|---|
| 1 | IDEATE | Explore the idea, shape it, decompose into actionable plans |
| 2 | ASSESS | Understand the problem before proposing solutions |
| 3 | ALIGN | Connect the solution to user goals and business value |
| 4 | CAPTURE | Write requirements as testable BDD scenarios |
| 5 | PLAN | Choose the technical approach with tradeoffs documented |
| 6 | DESIGN | Define the architecture: components, interfaces, data flow |
| 7 | SPEC | Refine until the plan survives 10 rounds of adversarial review |
| 8 | TEST | Write failing tests first — code does not exist yet |
| 9 | PREPARE | Set up the environment and scan existing code for risks |
| 10 | IMPLEMENT | Write all the code in one step, sub-items for each file |
| 11 | REVIEW | Self-review: does this code do what the plan said? |
| 12 | OPTIMIZE | Simplify, remove redundancy, improve performance |
| 13 | SECURE | Scan for vulnerabilities: OWASP Top 10, input validation, secrets |
| 14 | VERIFY | Automated gate: lint + typecheck + ALL tests + coverage >= 80% |
| 15 | DOCUMENT | Update docs to match the code that was actually written |
| 16 | FINAL-REVIEW | Human reviews the result and decides: ship, fix, or scrap |
Steps 1-7 are collaborative. Agents ask questions, present options with pros and cons, and wait for your decision. They don't generate plans in isolation — they work WITH you. The product-owner agent shapes your idea through conversation; the implementation-planner designs architecture with your input.
Steps 8-16 are automated. Once you approve the plan at Gate 2, agents execute all 9 implementation steps without interruption. You review the final result at Gate 3.
COLLABORATIVE — agents ask, you decide
═══════════════════════════════════════════════════════════════
IDEATION (Step 1) - Vision Phase
-------------------------------------------------------------
User dumps an idea → product-owner + vision agents explore it
|-- "What problem are we solving?" (agent asks)
|-- "Who benefits and how?" (agent asks)
|-- "What are the constraints?" (you answer)
|-- Decompose into actionable plans (together)
|-> Skip if the user already has a clear, specific request
|-> HUMAN GATE: User approves vision before functional planning
|-> Output: one or more plans ready for Phase 1
PHASE 1: FUNCTIONAL PLANNING (Steps 2-4) - Product Owner Role
-------------------------------------------------------------
2. ASSESS Agent asks clarifying questions [product-owner]
3. ALIGN Agent proposes goals, you approve [product-owner]
4. CAPTURE Agent writes BDD specs, you review [functional-reviewer] <--|
|-> Reject? Back to Step 2 ------------------------------------------|
|-> HUMAN GATE: User approves functional plan
PHASE 2: IMPLEMENTATION PLANNING (Steps 5-7) - Technical Role
-------------------------------------------------------------
5. PLAN Agent proposes approach, you choose [implementation-planner]
6. DESIGN Agent designs architecture, you validate [implementation-planner]
7. SPEC Agent writes specs, you review [implementation-plan-reviewer] <--|
|-> Reject? Back to Step 5 ---------------------------------------------------|
|-> Approve -> [iron-loop-plan-integrator] + [iron-loop-plan-critic] refine
|-- 10 rounds max refinement (6-dimension rubric)
|-- All 5/5? -> Iron-solid execution plan
|-- Max rounds? -> Auto-approve + Deferred Questions for Step 16
|-> HUMAN GATE: User approves technical approach
AUTOMATED — agents execute, you review
═══════════════════════════════════════════════════════════════
PHASE 3: IMPLEMENTATION (Steps 8-16) - Execution
-------------------------------------------------------------
8. TEST Write tests FIRST (TDD Red) [test-maker]
9. PREPARE Prepare environment + shift-left [quality-checker]
10. IMPLEMENT ALL code changes (single step) [implementer]
11. REVIEW Self-review checkpoint [self-reviewer] <---|
|-> TDD Loop: Need more tests? -> Back to Step 8 ------------------|
12. OPTIMIZE Performance + code simplification [optimizer]
13. SECURE Security vulnerability check [security-scanner]
14. VERIFY Run ALL quality checks (gate) [verifier]
15. DOCUMENT Update documentation [documenter]
16. FINAL-REVIEW Verify steps 8-15, human gate [implementation-reviewer]
|-> Issues? Smart kickback to affected step
|-> HUMAN GATE: User approves commit/push
Without ideation, Claude Code tends to jump straight to writing code — bypassing hooks and gates by treating requests as "trivial." The ideation phase gives the AI a structured entry point: explore the idea first, shape it with the product-owner agent, then flow naturally into the 16-step loop.
When to use ideation:
- You have a vague idea ("I want better error handling")
- You want to explore before committing to a direction
- You want the product-owner agent to ask the right questions
When to skip ideation:
- You have a precise, specific request ("Add a /health endpoint returning 200 OK")
- You're fixing a known bug with clear reproduction steps
- You say any escape phrase ("quick fix", "trivial change", etc.)
Each phase has entry criteria. Work cannot proceed until these are met.
- Problem statement exists (even if informal)
- User is available for clarification
- No duplicate plan already in progress
- Functional plan approved by user (Gate 1 passed)
- BDD scenarios defined with Given/When/Then
- Definition of Done is testable and measurable
- Implementation plan approved by user (Gate 2 passed)
- Integrator+Critic loop completed (all 5/5 or max rounds)
- Execution plan has concrete file paths and actions
- No blocking dependencies on other in-progress plans
- Guideline: plan touches <= 15 files (if more, consider splitting into multiple plans)
These step labels are MANDATORY and must NOT be modified, replaced, or reordered. They define the quality process.
TEST -> PREPARE -> IMPLEMENT -> REVIEW -> OPTIMIZE -> SECURE -> VERIFY -> DOCUMENT -> FINAL-REVIEW
8 9 10 11 12 13 14 15 16
| Step | Label | Purpose | NEVER Replace With |
|---|---|---|---|
| 8 | TEST | Write tests FIRST (TDD Red) | "Identify coverage" |
| 9 | PREPARE | Prepare environment, install deps, shift-left scans | "QUALITY", "SETUP" |
| 10 | IMPLEMENT | ALL code changes (single step with sub-items) | Multiple IMPLEMENT steps |
| 11 | REVIEW | Self-review checkpoint (logic only) | IMPLEMENT |
| 12 | OPTIMIZE | Performance and simplification | IMPLEMENT |
| 13 | SECURE | Security vulnerability check | IMPLEMENT |
| 14 | VERIFY | Run ALL quality checks (lint, type, tests, coverage) | Manual verification |
| 15 | DOCUMENT | Update documentation | VERIFY |
| 16 | FINAL-REVIEW | Verify steps 8-15, ready for human gate | VERIFY, COMMIT |
- Step 8 is TDD - Must WRITE tests, not just "identify existing coverage"
- Step 9 is PREPARE (not QUALITY) - Prepare environment AND run shift-left scans (SAST/SCA on existing code)
- Step 10 is ONE step - Multiple files = sub-items under Step 10, NOT separate IMPLEMENT steps
- Step 14 is automated VERIFY - Lint, type check, ALL tests, coverage >= 80%, 0 skipped, 0 flaky
- Step 16 is FINAL-REVIEW (not COMMIT) - Manual verification belongs here, not in Step 14
- Order matters - OPTIMIZE and SECURE may change code, so VERIFY must come AFTER them
Step 9 now includes shift-left security scanning (research: catching defects early costs 10-100x less):
Step 9: PREPARE
- Install/verify dependencies
- Verify build tools are available
- Run SAST on existing code touching the same modules
- Run SCA to check for known vulnerable dependencies
- Establish performance baselines for affected areas
- Report findings (info only, does not block - code doesn't exist yet)
Step 14 is the single quality gate. ALL checks must pass before proceeding:
Step 14: VERIFY
- Run lint (eslint, ruff, golangci-lint)
- Run type check (tsc, mypy, go vet)
- Run ALL tests (not just affected - full regression)
- Check coverage >= 80% on new code
- 0 skipped tests
- 0 flaky tests (retry 2x, then block)
- Run SAST on new/changed code
- Run SCA on updated dependencies
If ANY check fails -> SMART KICKBACK:
- Lint errors -> Step 10 (IMPLEMENT)
- Type errors -> Step 10 (IMPLEMENT)
- Tests fail -> Step 10 (IMPLEMENT)
- Security issue -> Step 13 (SECURE)
- Perf regression -> Step 12 (OPTIMIZE)
- Coverage < 80% -> Step 8 (TEST)
Every step MUST be safe to re-run. If a kickback sends execution back to a previous step, all subsequent steps re-execute cleanly. This means:
- Step 8 (TEST): Check for existing tests before creating duplicates
- Step 9 (PREPARE): Verify state before installing (don't re-install what exists)
- Step 10 (IMPLEMENT): Check git diff before making changes already applied
- Step 14 (VERIFY): Always runs fresh (no cached results)
Not every change needs the full 16-step ceremony. Match rigor to risk:
| Change Type | Example | Steps to Run | Steps Skipped |
|---|---|---|---|
| Typo/config | Fix spelling, update timeout | 10, 14 | All others |
| Bug fix (obvious) | Off-by-one in loop | 8, 10, 14, 16 | 1-7, 9, 11-13, 15 |
| Standard feature | Add copy-to-clipboard button | All 16 | None |
| Architecture change | Replace REST with GraphQL | All 16 + extended I+C (15 rounds) | None |
The rule: You can skip PREPARE, REVIEW, OPTIMIZE, SECURE, DOCUMENT. You can NEVER skip TEST (8) or VERIFY (14).
Escape phrases to enter micro mode: "skip planning", "skip iron loop", "quick fix", "trivial fix", "trivial change", "hotfix", "urgent".
Kickbacks are normal — they mean the quality gate is working. But infinite loops are not.
| Limit | Threshold | Action |
|---|---|---|
| Same-step kickbacks | 3 | Stop. Escalate to user with diagnosis. |
| Total kickbacks per plan | 5 | Stop. Present full kickback history + root cause analysis. |
When the circuit breaker trips, present:
- Which steps keep failing and why
- What was tried each time
- Recommended path forward (fix approach vs. descope vs. manual intervention)
Future enhancements (not yet implemented):
- Step Timing — Record duration per step to identify bottlenecks and optimize the loop
- Failure Budgets — Track quality failures across plans monthly; alert when threshold exceeded
- Retrospective Feedback Loop — Auto-generate retrospective every 5 completed plans (what worked, what didn't, improvement actions)
Step labels are validated programmatically by src/lib/plan-validator.js and enforced by src/hooks/validate-plan-steps.js. Plans with wrong labels are REJECTED before execution.
Phase 1 acts as Product Owner for the project. These steps are collaborative — agents ask questions, present options with pros and cons, and wait for the user's decision. They never generate a plan in isolation.
- Asks clarifying questions to understand what the user needs (ASSESS)
- Proposes alignment with business goals, user validates (ALIGN)
- Writes requirements as implementable specs, user reviews (CAPTURE)
All features are captured as:
- User Stories - "As a [user], I can [action] so that [benefit]"
- Behavior Scenarios - Given/When/Then (Gherkin format)
- Definition of Done - Automated test conditions
- Acceptance Criteria - Measurable success conditions
Feature: User Login
User Story: As a registered user, I can log in so that I access my account
Scenario: Successful login
Given I am on the login page
And I have a valid account
When I enter my email and password
And I click "Log In"
Then I should see my dashboard
And I should see a welcome message
Scenario: Invalid password
Given I am on the login page
When I enter wrong password
Then I should see "Invalid credentials"
And I should remain on login page
Definition of Done:
- All scenarios pass as automated tests
- Login attempt rate limiting is implemented
- Session management follows OWASP guidelines
- Error messages don't leak user existenceEven trivial requests get a mini-plan with test. User can override with any escape phrase: "skip planning", "skip iron loop", "quick fix", "trivial fix", "trivial change", "hotfix", "urgent"
The Iron Loop is enforced by hooks that run before every Edit/Write operation.
On Edit/Write tool call:
|-- Load Iron Loop state
|-- Check enforcement mode (strict/soft/off)
|-- Check if file is whitelisted (*.md, *.yaml, .ctoc/**)
| |-- If whitelisted -> ALLOW
|-- Check for escape phrase in user message
| |-- If found -> ALLOW
|-- Check currentStep
| |-- If step >= 8 -> ALLOW
| |-- If step < 8 -> BLOCK (exit 1)
Currently only strict mode is implemented. soft and off are planned.
| Mode | Behavior | Status |
|---|---|---|
strict |
Block Edit/Write if planning incomplete (default) | Implemented |
soft |
Warn but allow Edit/Write | Planned |
off |
No enforcement | Planned |
These files bypass enforcement (hooks allow them regardless of step):
.ctoc/**- CTOC configuration.local/**- Local stateplans/*.md- Plan files.gitignore,.gitattributes- Git configuration
User can bypass enforcement by including these phrases in their message. Claude interprets them and adjusts behavior accordingly:
- "skip planning" / "skip iron loop"
- "quick fix" / "trivial fix" / "trivial change"
- "hotfix" / "urgent"
Note: Escape phrases are interpreted by Claude via CLAUDE.md instructions, not enforced programmatically by hooks.
When an implementation session (Steps 8-16) is interrupted, CTOC automatically detects and offers recovery. Detection is implemented in src/hooks/SessionStart.js with state managed by src/lib/state-manager.js.
A session is considered interrupted if:
sessionStatusis "active" (not cleanly ended)currentStepis between 8 and 16 (implementation phase)lastActivityis within the last 24 hours
| Option | What It Does |
|---|---|
| Resume | Continue from the last completed step |
| Restart | Start implementation fresh from Step 8 (tests preserved) |
| Discard | Abandon this implementation entirely |
- Session Start: Sets
sessionStatus: "active", updateslastActivity - Every Step Completion: Updates
lastActivityandlastCompletedStep - Clean Exit: Sets
sessionStatus: "ended"
When an implementation plan is approved at Step 7, the Integrator and Critic agents refine it into an iron-solid execution plan.
Input: Approved Implementation Plan
Round N:
[Integrator] -> Creates/refines execution plan (Steps 8-16)
[Critic] -> Scores 6 dimensions (all must be 5/5)
If any < 5:
Critic provides: reason + suggested fix
Integrator refines, loop continues...
Termination:
- All 5/5: Iron-solid plan ready
- Max rounds (10): Auto-approve + Deferred Questions
Output: Execution plan appended to the plan file in plans/implementation/
Each dimension scored 1-5. All must reach 5/5 for auto-approval.
| Dimension | Score 5/5 Means | Common Failure Modes |
|---|---|---|
| Completeness | All steps have actions, all modules covered, 80% coverage baseline | Missing edge case handling, incomplete rollback plan |
| Clarity | Unambiguous instructions, single responsibility, self-documenting | Vague "update as needed", multi-purpose steps |
| Edge Cases | Error handling, fallback behavior, rollback plan, timeout handling | Happy path only, no error recovery |
| Efficiency | Minimal steps, no redundancy, parallelizable, token budget reasonable | Redundant checks, over-engineered steps |
| Security | OWASP Top 10, input validation, no secrets, protected endpoints | Missing auth checks, unvalidated input |
| Observability | Logging at key points, metrics for monitoring, error tracing, health checks | Silent failures, no monitoring hooks |
Round 3 of 10:
Completeness: 5/5
Clarity: 5/5
Edge Cases: 3/5 — Step 10 does not specify behavior when database
is unreachable. Add: "If DB down, /health
returns 503 with { status: 'degraded' }."
Efficiency: 5/5
Security: 5/5
Observability: 4/5 — No logging on health check failure.
Add: "Log warning on 503 response."
Result: NOT APPROVED — 2 dimensions below 5/5.
Integrator will refine and re-submit.
When max rounds (10) is reached and some dimensions still score < 5, unresolved issues become Deferred Questions presented at Step 16 (FINAL-REVIEW) with context, options, and pros/cons.
| Setting | Default | Description |
|---|---|---|
integration.max_rounds |
10 | Maximum refinement rounds |
integration.quality_threshold |
5 | All dimensions must meet this |
integration.auto_approve_after_max |
true | Auto-approve after max rounds |
integration.defer_unresolved |
true | Store unresolved as Deferred Questions |
Implemented in src/lib/iron-loop.js. Triggered automatically when an implementation plan is approved via approvePlan() in src/lib/actions.js. Plan status is tracked in YAML frontmatter, not subdirectories.
| Gate | Transition | User Decision |
|---|---|---|
| Gate 0 | Vision -> Functional | "Approve idea to explore?" |
| Gate 1 | Functional -> Implementation | "Approve functional plan?" |
| Gate 2 | Implementation -> Todo | "Approve technical approach?" |
| Gate 3 | Final Review -> Done | "Commit/push or send back?" |
+----------+ +----------+ +----------+ +----------+ +-----------+ +----------+ +----------+
| vision | |functional| |implement.| | todo | |in-progress| | review | | done |
+----------+ +----------+ +----------+ +----------+ +-----------+ +----------+ +----------+
| Ideas | | Steps 2-4| | Steps 5-7| | Backlog | | Steps 8-15| | Step 16 | |Completed |
| | | BDD specs| | Technical| | Ready to | | Active | | Human | | |
| | | | | approach | | start | | work | | gate | | |
+----------+ +----------+ +----------+ +----------+ +-----------+ +----------+ +----------+
| | |
[HUMAN] [HUMAN] [HUMAN]
Column order follows the plan lifecycle left-to-right. in-progress is a logical state tracked in plan YAML frontmatter; plans physically remain in todo/ until moved to review/.
| # | Dimension | Key Checks | Evaluated At |
|---|---|---|---|
| 1 | Correctness | Tests meaningful, edge cases, business logic | Step 11 (REVIEW), Step 14 (VERIFY) |
| 2 | Completeness | All criteria met, implicit requirements | Step 11 (REVIEW), Step 16 (FINAL-REVIEW) |
| 3 | Maintainability | Patterns, no smells, readable by junior | Step 11 (REVIEW), Step 12 (OPTIMIZE) |
| 4 | Security | OWASP, validation, auth/authz | Step 9 (PREPARE), Step 13 (SECURE) |
| 5 | Performance | No N+1, caching, response time | Step 12 (OPTIMIZE), Step 14 (VERIFY) |
| 6 | Reliability | Error handling, retries, fault tolerance | Step 11 (REVIEW), Step 16 (FINAL-REVIEW) |
| 7 | Compatibility | API backwards compat, integrations | Step 11 (REVIEW) |
| 8 | Usability | Error messages, clear output, docs | Step 15 (DOCUMENT), Step 16 (FINAL-REVIEW) |
| 9 | Portability | No hardcoded paths, cross-platform | Step 11 (REVIEW), Step 14 (VERIFY) |
| 10 | Testing | 80%+ coverage on new code, isolation, happy+error paths | Step 8 (TEST), Step 14 (VERIFY) |
| 11 | Accessibility | WCAG 2.2, screen reader, keyboard | Step 11 (REVIEW), Step 16 (FINAL-REVIEW) |
| 12 | Observability | Logging, metrics, tracing, alerts | Step 11 (REVIEW), Step 16 (FINAL-REVIEW) |
| 13 | Safety | No harm, graceful degradation | Step 13 (SECURE), Step 16 (FINAL-REVIEW) |
| 14 | Ethics/AI | Bias, fairness, explainability | Step 16 (FINAL-REVIEW) |
Model assignments indicate recommended complexity tier. Actual model depends on user configuration.
| Agent | Model | Steps | Role |
|---|---|---|---|
| cto-chief | opus | 1-16 | Coordinator |
| product-owner | sonnet | 2-4 | BDD Specs (Product Owner) |
| functional-reviewer | opus | 4 | Review Gate |
| implementation-planner | opus | 5-7 | Technical Planning |
| implementation-plan-reviewer | opus | 7 | Review Gate |
| iron-loop-plan-integrator | opus | 7 | Creates execution plans |
| iron-loop-plan-critic | opus | 7 | Reviews execution plans (6-dim rubric) |
| test-maker | opus | 8 | TDD Red (write tests FIRST) |
| quality-checker | sonnet | 9 | Prepare environment + shift-left |
| implementer | sonnet | 10 | ALL code changes |
| self-reviewer | opus | 11 | Self-review checkpoint |
| optimizer | sonnet | 12 | Performance + Simplification |
| security-scanner | opus | 13 | Security vulnerability check |
| verifier | sonnet | 14 | Quality gate (lint, type, tests, SAST) |
| documenter | sonnet | 15 | Documentation |
| implementation-reviewer | opus | 16 | Final review + human gate |
Tests must NEVER silently fail. This is the #1 quality rule.
| Pattern | Status | Why |
|---|---|---|
| Empty catch blocks | BLOCK | Hides failures |
| Early return without assertion | BLOCK | Test passes without testing |
| Tests without assertions | BLOCK | Always passes |
| Fixture errors swallowed | BLOCK | Setup failures hidden |
| Skip without reason | BLOCK | Unclear why skipped |
If a test cannot run, it must FAIL LOUDLY. Period.
If the project has Dockerfile or docker-compose.yml: build, health-check, and test against the container before deploy.
Local tests must use the same commands, flags, and environment as CI. If it passes locally but fails in CI, local setup is wrong.
Flaky tests erode trust in the entire quality system. A flaky test is retried 2x. If it still flickers, it is marked as a blocking issue that must be fixed before any new features proceed.
plans/
|-- vision/ # Ideas and explorations (pre-planning)
|-- functional/ # Steps 2-4 plans (BDD specs)
|-- implementation/ # Steps 5-7 plans (technical approach)
|-- todo/ # Backlog (ready for execution)
|-- review/ # Awaiting final human review (Step 16)
|-- done/ # Completed
Plan state (draft vs. approved, in-progress tracking) is managed via YAML frontmatter inside each plan file, not via subdirectories. The execution/ output from the Integrator+Critic loop is embedded in the plan file itself.
These are guidelines, not hard limits. If a step takes significantly longer, the plan may need splitting.
| Step | Expected Duration | If Longer, Consider |
|---|---|---|
| 1 (IDEATE) | 5-15 minutes | Idea may need more exploration |
| 2-4 (Functional) | 5-15 minutes | Problem may be poorly defined |
| 5-6 (Technical) | 10-30 minutes | Plan may need splitting |
| 7 (Integrator+Critic) | 5-20 minutes | Max 10 rounds, auto-approve if stuck |
| 8 (TEST) | 5-15 minutes | Too many test cases; focus on critical paths |
| 9 (PREPARE) | 2-5 minutes | Environment issues; fix before proceeding |
| 10 (IMPLEMENT) | 10-60 minutes | Plan touches too many files (>15 = split) |
| 11-13 (Review cycle) | 5-10 minutes each | Findings may require kickback |
| 14 (VERIFY) | 2-5 minutes | Failures trigger smart kickback |
| 15-16 (Finalize) | 5-10 minutes | Should be fast if earlier steps were thorough |
Last updated: 2026-02-21