Observed across a 39-round RLCR loop where rounds 6-39 (~85% of total) were driven by reviewer findings whose SHAPE repeated. The first ~5 rounds reached "all acceptance criteria green"; the next 34 rounds hardened a feature whose externally observable contract was unchanged. Several individual late rounds were genuinely high-value, but the methodology kept rediscovering isomorphic variants of the same conceptual bug at the next sibling site, one round at a time.
This issue is sourced from a sanitized methodology analysis (no project-specific information: no file paths, function names, identifiers, business-domain terms, code, branch names, commit hashes, or product references).
Headline patterns observed
- Sibling-site domino chains. A defensive pattern (e.g. neutralise an operator pin under condition X, or catch decode/parse errors at a read site) was added in one round, then re-added in 5-8 follow-up rounds as the reviewer found each adjacent site that needed the same pattern. The first fix never triggered a same-round sweep across structurally similar sites.
- Combinatorial decision-table gaps. One core piece of logic had a 3 x 2 x 2 x 2 = 24-cell decision table (loader output state x pin-supplied x feature-required x failure-shape-subclass). Defensive code grew in roughly that many cells, one cell per round across ~8 rounds, in the order the reviewer happened to think of them. The table was never elicited up-front.
- Plan-amendment aftershocks. A mid-loop amendment reversed a literal reading of an ambiguous AC and changed a data structure's field set. Several subsequent rounds were aftershocks finding consumers that still encoded the pre-amendment contract. No round was dedicated to a contract-aware sweep after the amendment.
- Recurring incidental rework. Line-range-based architecture tests required re-pointing twice in the same loop after upstream insertions shifted blocks downstream. The rework was "queued for future plan" but the tax kept being paid mid-loop, costing a full round of overhead each time.
- No convergence criterion beyond "reviewer found nothing". A reviewer that scans hard enough will always find one more isinstance check to add or one more decode error to wrap. There is no upper bound, and the loop kept iterating 34 rounds past AC-green.
- Acceptance criteria missed the adversarial surface. The original AC set covered happy + planned failure modes but not the adversarial surface: forged hashes that pass internal consistency, identity-renamed manifests, non-ASCII / non-object input bytes, JSON numbers where ISO strings were expected, etc. Each became its own round.
Six suggested methodology changes
1. Same-round sibling-site sweep
After the first time a reviewer finding is closed by a fix that follows a clear pattern (defensive guard, normalisation, isinstance check), the implementing agent must perform an exhaustive sibling-site sweep in the same round. The round template adds a "pattern propagation" section: (1) the defensive pattern with a one-sentence description, (2) a grep/AST query identifying every structurally similar site, (3) for each hit, an inline fix or an explicit justified exemption.
Complementary review-side guard: when a reviewer files a finding whose root cause matches a closed finding from any prior round in the same loop, flag it as a "sibling-site extension" and require the implementer to answer "what would have to change so that the next sibling-site reviewer finding cannot exist?" before resuming.
2. Decision-table elicitation as a planning artifact (HIGHEST LEVERAGE)
Augment the planning phase with a mandatory decision-table elicitation step for any AC involving multi-state interaction (two+ independent flags, two+ failure modes, two+ entry points). The plan enumerates every combination before implementation begins; the implementation round produces a test row per cell (or an explicit "this cell is unreachable because X" annotation).
When a reviewer in a later round finds a missing cell, treat that as a plan defect, not just an implementation defect: the table that should have caught the gap goes back into the plan for amendment, and the implementing agent re-walks the (possibly expanded) table before claiming the round done. This is the single highest-leverage change because it would have prevented the largest cluster of late rounds.
3. Contract-aftershock audit round after plan amendments
When a round closes a P1 via plan amendment changing a data structure's field set, a validator's contract, or a function's signature, the next round is designated a "contract-aftershock audit round": the implementing agent walks every consumer of the changed contract and explicitly notes "unaffected" or "needs adjustment X". The reviewer for that round is given the contract change as context and asked specifically "is there any consumer that still encodes the pre-amendment contract?"
4. Auto-escalate recurring test-infra incidents
Introduce a "blocker" classification for test-infrastructure incidents that occur twice or more in the same loop. After the second occurrence, the next round must either (a) convert the brittle test to a content-anchored / AST-anchored shape, or (b) explicitly accept the recurring tax with a sign-off. "Queued for future plan" items that turn into actual mid-loop rework auto-escalate to in-scope after the second occurrence — the implementing agent cannot defer the same incidental rework round after round.
5. Explicit convergence signal beyond "reviewer found nothing"
Track two convergence signals: (a) AC convergence (existing — all ACs proven green by named tests), and (b) defect-density convergence (new — the type-novelty of reviewer findings has dropped to a threshold). Classify each round's fixes as novel-family / sibling-extension / defence-in-depth. After N consecutive rounds producing only sibling-extension and defence-in-depth findings, auto-trigger one final exhaustive sibling-site sweep and exit.
Weaker but immediately deployable: require each round summary to declare the (novel-family / sibling-extension / defence-in-depth) distribution; the reviewer prompt for the next round receives this distribution and can say "I am no longer finding novel families; I propose exit."
6. Adversary table as a planning artifact for attestation features
For any plan whose stated purpose is to attest to something (provenance, integrity, freshness, identity), the planning phase produces an adversary table with columns: (trusted input, what an adversary or misconfigured operator can do to it, what the system must detect, how it must respond, which test proves the detection). The table is part of the plan and required reading for both implementer and reviewer; ACs do not pass until every row has a green test row.
For non-attestation features with a multi-input failure surface, a lighter "input mutation matrix" — just (input, possible malformations, expected response, test) — is required.
Closing note
The methodology produced a robust feature — these suggestions are not about correctness but about COST. The same shape of finding kept appearing one round at a time when a batched sweep would have closed five or six findings per round. Suggestion 2 (decision-table elicitation) alone would have prevented the largest cluster of late rounds (the pin-neutralisation chain and the file-read-robustness chain), each spanning 5-8 rounds and closing roughly the same logical bug at the next cell in an enumerable table.
Filed by the RLCR loop's automated post-completion methodology-analysis phase.
Observed across a 39-round RLCR loop where rounds 6-39 (~85% of total) were driven by reviewer findings whose SHAPE repeated. The first ~5 rounds reached "all acceptance criteria green"; the next 34 rounds hardened a feature whose externally observable contract was unchanged. Several individual late rounds were genuinely high-value, but the methodology kept rediscovering isomorphic variants of the same conceptual bug at the next sibling site, one round at a time.
This issue is sourced from a sanitized methodology analysis (no project-specific information: no file paths, function names, identifiers, business-domain terms, code, branch names, commit hashes, or product references).
Headline patterns observed
Six suggested methodology changes
1. Same-round sibling-site sweep
After the first time a reviewer finding is closed by a fix that follows a clear pattern (defensive guard, normalisation, isinstance check), the implementing agent must perform an exhaustive sibling-site sweep in the same round. The round template adds a "pattern propagation" section: (1) the defensive pattern with a one-sentence description, (2) a grep/AST query identifying every structurally similar site, (3) for each hit, an inline fix or an explicit justified exemption.
Complementary review-side guard: when a reviewer files a finding whose root cause matches a closed finding from any prior round in the same loop, flag it as a "sibling-site extension" and require the implementer to answer "what would have to change so that the next sibling-site reviewer finding cannot exist?" before resuming.
2. Decision-table elicitation as a planning artifact (HIGHEST LEVERAGE)
Augment the planning phase with a mandatory decision-table elicitation step for any AC involving multi-state interaction (two+ independent flags, two+ failure modes, two+ entry points). The plan enumerates every combination before implementation begins; the implementation round produces a test row per cell (or an explicit "this cell is unreachable because X" annotation).
When a reviewer in a later round finds a missing cell, treat that as a plan defect, not just an implementation defect: the table that should have caught the gap goes back into the plan for amendment, and the implementing agent re-walks the (possibly expanded) table before claiming the round done. This is the single highest-leverage change because it would have prevented the largest cluster of late rounds.
3. Contract-aftershock audit round after plan amendments
When a round closes a P1 via plan amendment changing a data structure's field set, a validator's contract, or a function's signature, the next round is designated a "contract-aftershock audit round": the implementing agent walks every consumer of the changed contract and explicitly notes "unaffected" or "needs adjustment X". The reviewer for that round is given the contract change as context and asked specifically "is there any consumer that still encodes the pre-amendment contract?"
4. Auto-escalate recurring test-infra incidents
Introduce a "blocker" classification for test-infrastructure incidents that occur twice or more in the same loop. After the second occurrence, the next round must either (a) convert the brittle test to a content-anchored / AST-anchored shape, or (b) explicitly accept the recurring tax with a sign-off. "Queued for future plan" items that turn into actual mid-loop rework auto-escalate to in-scope after the second occurrence — the implementing agent cannot defer the same incidental rework round after round.
5. Explicit convergence signal beyond "reviewer found nothing"
Track two convergence signals: (a) AC convergence (existing — all ACs proven green by named tests), and (b) defect-density convergence (new — the type-novelty of reviewer findings has dropped to a threshold). Classify each round's fixes as novel-family / sibling-extension / defence-in-depth. After N consecutive rounds producing only sibling-extension and defence-in-depth findings, auto-trigger one final exhaustive sibling-site sweep and exit.
Weaker but immediately deployable: require each round summary to declare the (novel-family / sibling-extension / defence-in-depth) distribution; the reviewer prompt for the next round receives this distribution and can say "I am no longer finding novel families; I propose exit."
6. Adversary table as a planning artifact for attestation features
For any plan whose stated purpose is to attest to something (provenance, integrity, freshness, identity), the planning phase produces an adversary table with columns: (trusted input, what an adversary or misconfigured operator can do to it, what the system must detect, how it must respond, which test proves the detection). The table is part of the plan and required reading for both implementer and reviewer; ACs do not pass until every row has a green test row.
For non-attestation features with a multi-input failure surface, a lighter "input mutation matrix" — just (input, possible malformations, expected response, test) — is required.
Closing note
The methodology produced a robust feature — these suggestions are not about correctness but about COST. The same shape of finding kept appearing one round at a time when a batched sweep would have closed five or six findings per round. Suggestion 2 (decision-table elicitation) alone would have prevented the largest cluster of late rounds (the pin-neutralisation chain and the file-read-robustness chain), each spanning 5-8 rounds and closing roughly the same logical bug at the next cell in an enumerable table.
Filed by the RLCR loop's automated post-completion methodology-analysis phase.