RLCR methodology: 8 improvements from a session that exited on stagnation

# RLCR Methodology Analysis Report

This report analyzes a five-round RLCR session purely from a methodology
perspective. All project-specific details (paths, identifiers, domain
terms, test counts, commit hashes, etc.) have been stripped; only
methodology patterns remain.

---

## Session Shape At A Glance

- Total rounds: 5 (round 0 through round 4).
- Exit reason: explicit `STOP` from reviewer after consecutive stalled
  verdicts.
- Verdict progression: `ADVANCED` (round 0) → `ADVANCED` (round 1) →
  `STALLED` (round 2) → `STALLED` (round 3) → `STALLED` (round 4).
- Net forward motion: substantive code-side work completed in rounds 0
  and 1. Rounds 2-4 produced almost no new mainline progress yet still
  consumed full review cycles.
- Final state at exit: implementer believed work was code-complete
  locally; reviewer maintained a not-complete verdict because the remote
  surface it inspected disagreed with the implementer's local state.

This is the single most important shape in the data: **two productive
rounds followed by three rounds of state-transport churn**, with the
loop unable to self-resolve the gap.

---

## Finding 1 — Verdict Contract Drift Between Plan And Reviewer

### Pattern observed

The plan document explicitly designated some items as "lower bound: may
land as follow-up". The reviewer, however, treated any non-completed
item — including those plan-declared follow-ups — as an "unjustified
deferral" and refused a complete verdict on that basis. The implementer
then spent multiple rounds attempting to re-argue the carve-out using
verbatim quotes from the plan. The reviewer never accepted the
re-framing and never updated its definition of done; the implementer
never abandoned the carve-out either. The two contracts ran parallel
for three rounds without converging.

### Why this stalls progress

The "definition of done" was effectively ambiguous between two
sources of truth: (a) the plan's lower bound clause, and (b) the
reviewer's standing rule that deferred items block completion. Each
side believed the other was wrong, and the loop had no mechanism to
adjudicate. Subsequent rounds repeated the same disagreement with
slightly different phrasing instead of escalating.

### Methodology improvement

Before the first execution round, the loop should run a short
"completion contract reconciliation" step that pins down, in one place,
which artifacts can be deferred to follow-up and which cannot. The
output should be a small machine-readable list keyed to the plan, and
both the implementer prompt and reviewer prompt should be required to
quote it verbatim when justifying any "deferred" or "blocking"
classification. If the implementer and reviewer disagree about an
item's classification mid-loop, that is a hard escalation signal, not
material for another round.

---

## Finding 2 — Tool/Environment Asymmetry Between Implementer And Reviewer

### Pattern observed

The implementer operated against the local working tree. The reviewer
operated primarily against a remote surface fetched via an external
connector. From round 2 onward, every reviewer attempt to run local
verification commands failed at the sandbox/namespace layer, so the
reviewer's only ground truth was the remote surface. Meanwhile, the
implementer was prevented from pushing to the remote by a different
hook in the same loop. The result was a structural impossibility: the
implementer could only change local state, the reviewer could only see
remote state, and nothing in the loop bridged the two.

### Why this stalls progress

For three full rounds the reviewer correctly observed "the remote does
not reflect the claimed work", and the implementer correctly observed
"my local state matches the plan". Both were honest; both were
operating on different data. No amount of additional implementation
effort could have changed the verdict, because the bottleneck was a
transport boundary the loop itself enforced.

### Methodology improvement

When the reviewer is configured to inspect a remote surface, the
implementer must be allowed to publish to that surface (or vice versa).
The loop's startup configuration should enforce this as a hard
precondition: if the review channel is remote-backed, then either
push-every-round is enabled, or the loop refuses to start. As a softer
guard, the loop's heartbeat should detect "implementer reports done
but reviewer's data source is stale" as a distinct stall class and
emit an out-of-loop escalation rather than another review request.

---

## Finding 3 — Stagnation Detection Fired Too Late

### Pattern observed

The session metadata shows a `mainline_stall_count` of 2 at exit, and
the reviewer issued its terminal `STOP` only after the fourth round.
Yet the underlying cause of stalling — the transport gap and the
contract disagreement — was already fully diagnosable at the end of
round 2: same verdict, same blocking reasons, no new evidence
introduced. Rounds 3 and 4 essentially re-litigated the same
disagreement and re-described the same transport gap with more
documentation.

### Why this stalls progress

Stall detection that triggers on consecutive identical verdicts is too
coarse. By the time it fires, the loop has already spent rounds where
the implementer's response to "you are stalled" is itself a stall
(producing meta-commentary about why the stall is not their fault).
Round count vs progress ratio in rounds 2-4 was effectively 3-to-zero
on mainline code; only one small workflow-permissions line of code was
added in that span, and that line never reached the reviewer's data
source.

### Methodology improvement

Augment stall detection with structural signatures rather than verdict
strings. Specifically, flag a round as a likely stall when (a) the
implementer's summary contains no new code commits relative to the
previous round, or (b) the reviewer's "blocking" list is a strict
subset of the previous round's "blocking" list, or (c) the implementer
spends more than a small fixed share of its summary arguing about
tracker classification rather than describing new work. Any of these
should escalate immediately, not wait for a second identical verdict.

---

## Finding 4 — Implementer Used Summaries To Re-Argue The Contract

### Pattern observed

Starting in round 2, the implementer's summaries devoted increasing
space to "goal tracker update requests" — formal proposals to
reclassify items the reviewer had marked as blocking. Each subsequent
round's summary added more carve-out quotations, more justification
paragraphs, and more meta-discussion about which contract should win.
The reviewer ignored these requests on each pass.

### Why this stalls progress

Summaries are read by the reviewer as evidence of work performed, not
as appeals about scoring. Embedding contract-arbitration content
inside a summary muddies what the round actually delivered, and the
reviewer correctly discounted it. But the loop had no other channel
for the implementer to raise "I believe the completion contract is
being applied incorrectly", so it kept being smuggled into summaries.

### Methodology improvement

Give the loop a first-class "contract objection" channel that is
separate from the round summary. An objection should be raised at
most once per item, must cite a specific plan clause, and must be
adjudicated by an out-of-loop actor (user or a dedicated arbitrator
prompt) before the next round begins. If the objection is denied, the
implementer must accept the reviewer's contract going forward and
stop reasserting the same carve-out. If accepted, the reviewer's
contract is patched and both sides see the change.

---

## Finding 5 — Late-Surfacing Work Items Were Absorbed But Not Counted

### Pattern observed

Mid-session, a real but small piece of missed scope was discovered and
addressed by spawning a micro-team inside the round. The work was
correct, but it was reported inside a round that otherwise had no code
content, and the round's main narrative was about documentation and
contract reconciliation. The reviewer largely did not credit the fix
because the broader round was already classified as stalled, and the
fix never reached the surface the reviewer was inspecting anyway.

### Why this stalls progress

The loop conflates "round purpose" with "round content". A round
nominally dedicated to non-code reconciliation that quietly carries a
real code fix gives the reviewer no clean handle for crediting that
fix. The implementer also has weaker incentive to push such fixes
hard, because they appear secondary to the round's stated framing.

### Methodology improvement

Adopt a small fixed taxonomy of round types (e.g. "implementation",
"verification", "recovery", "documentation") and require each round
to declare exactly one type up front. Code commits inside a non-code
round are still allowed, but must be hoisted to a "side fix" block at
the top of the summary and the reviewer prompt should be instructed
to evaluate side fixes independently of the round's main verdict.
This prevents real progress from being collateral damage to
unrelated process disputes.

---

## Finding 6 — Recovery Round Was Spent On Documentation Instead Of
## Escalation

### Pattern observed

The final round explicitly identified itself as a "drift recovery"
round, correctly diagnosed the transport boundary as the root cause,
and produced a detailed contract document explaining the situation.
But the recovery action it took was to write more documentation and
re-request a tracker update — both still inside the loop. The
actually-needed action (user runs a single command out-of-band) was
documented but not escalated as a hard stop with a one-line ask.

### Why this stalls progress

Once the loop's own boundaries are the problem, in-loop output is the
wrong vehicle. Pages of explanation give the user more to read, not
more to act on. The reviewer correctly noted that nothing on its data
source had changed and issued the terminal stop.

### Methodology improvement

When a round identifies a boundary that the loop itself cannot cross,
its only deliverables should be: (1) a one-sentence statement of what
the human needs to do, (2) the exact command or click, and (3) an
immediate loop pause. Long contract-recovery prose belongs in
post-session notes, not in a round artifact. The loop should expose a
"hand back to human" action that produces a minimal handoff and
suspends rounds until the human signals.

---

## Finding 7 — Review Feedback Quality Was High, But Actionable Density
## Decayed

### Pattern observed

Early-round reviews (rounds 0 and 1) produced concrete, evidence-cited,
actionable feedback that led directly to substantive implementation
work the next round. Reviews in rounds 2 through 4 were still factually
accurate and well-cited, but the actionable content collapsed to a
single recurring instruction ("publish the work to the surface I can
see"). The implementer treated this as a known issue rather than an
action, and the reviews offered no alternative paths.

### Why this stalls progress

When a reviewer's feedback reduces to one item the implementer cannot
do alone, repeating the feedback adds no information. The loop has no
mechanism for the reviewer to either (a) widen its data source, or
(b) declare the loop blocked on a specific external precondition and
hand back. So it keeps issuing the same instruction.

### Methodology improvement

Give the reviewer an explicit "blocked-on-external" verdict distinct
from "stalled". A blocked verdict should carry a structured field
naming the external actor, the exact action required, and the
expected unblock signal. The loop scheduler should pause rounds on
blocked verdicts rather than immediately requesting another
implementation round — this stops the implementer from filling
otherwise-blocked rounds with reconciliation prose.

---

## Finding 8 — Communication Clarity Eroded As Rounds Accumulated

### Pattern observed

Round 0 and round 1 summaries were tightly scoped: what was built,
what was verified, what remained, with crisp evidence. Later round
summaries grew in length while shrinking in new information per page.
Reviewer outputs in later rounds shrank slightly in length but
repeated structurally identical content with minor wording shifts. The
ratio of new-signal to total-text dropped sharply across the session.

### Why this stalls progress

Long, mostly-restated artifacts increase the cost of each round for
all readers (reviewer, scheduler, eventual human auditor) without
adding decision-relevant information. They also let the implementer
feel like work was performed when very little new was produced.

### Methodology improvement

Cap each round summary and each review output at a fixed budget per
section, with an explicit "new this round" subsection that must be
strictly non-overlapping with previous rounds. Reviewer feedback that
is a repeat of last round should be encoded as a reference to that
round, not re-prose. This forces both sides to either produce new
substance or admit there is none, which feeds back into the stall
detector.

---

## Summary Of Recommended Methodology Changes

1. **Pre-loop completion contract reconciliation.** Materialize a
   single shared definition of done, derived from the plan, that both
   the implementer and reviewer must quote when classifying items.

2. **Symmetric surface guarantee.** If the reviewer reads a remote
   surface, the implementer must be able to publish to it; enforce at
   loop start, not at stall time.

3. **Structural stall detection.** Flag stalls on signatures like "no
   new commits", "blocking list unchanged", and "summary dominated by
   classification arguments", not just on repeated verdict strings.

4. **Separate contract-objection channel.** Pull tracker arbitration
   out of round summaries; cap at one objection per item; resolve
   out-of-loop.

5. **Explicit round-type declaration.** Each round declares one type;
   side fixes are hoisted and reviewed independently of the main
   verdict.

6. **First-class "hand back to human" action.** When the blocker is
   outside the loop boundary, the round produces a one-line ask and a
   pause, not more prose.

7. **Distinct "blocked-on-external" verdict.** Reviewer can declare
   the loop blocked on a named external precondition; scheduler
   pauses rather than requesting another implementation round.

8. **Per-round content budgets with "new this round" requirement.**
   Force each artifact to carry strictly novel content or shrink to
   a reference.

The overarching theme is that the session's first two rounds worked
well, and everything after that was the loop fighting its own
boundaries rather than the implementer fighting the problem. The
methodology changes above target that specific failure mode.


RLCR methodology: 8 improvements from a session that exited on stagnation #170

Description

RLCR Methodology Analysis Report

Session Shape At A Glance

Finding 1 — Verdict Contract Drift Between Plan And Reviewer

Pattern observed

Why this stalls progress

Methodology improvement

Finding 2 — Tool/Environment Asymmetry Between Implementer And Reviewer

Pattern observed

Why this stalls progress

Methodology improvement

Finding 3 — Stagnation Detection Fired Too Late

Pattern observed

Why this stalls progress

Methodology improvement

Finding 4 — Implementer Used Summaries To Re-Argue The Contract

Pattern observed

Why this stalls progress

Methodology improvement

Finding 5 — Late-Surfacing Work Items Were Absorbed But Not Counted

Pattern observed

Why this stalls progress

Methodology improvement

Finding 6 — Recovery Round Was Spent On Documentation Instead Of

Escalation

Pattern observed

Why this stalls progress

Methodology improvement

Finding 7 — Review Feedback Quality Was High, But Actionable Density

Decayed

Pattern observed

Why this stalls progress

Methodology improvement

Finding 8 — Communication Clarity Eroded As Rounds Accumulated

Pattern observed

Why this stalls progress

Methodology improvement

Summary Of Recommended Methodology Changes

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions