Skip to content

RLCR Methodology Improvements: Systematic Class-Fix Auditing, Completion Guards, and Checklist Enforcement #163

@mockiemochi

Description

@mockiemochi

Summary

After a 21-round RLCR session that added 7 benchmark tasks, analysis identified 6 methodology improvement opportunities. The session followed a productive "big-bang implementation + long review tail" pattern, but specific patterns could reduce round count significantly.

Key Metrics

  • Total rounds: 21
  • Rounds with code changes: 16
  • Rounds on same defect class (verifier thresholds): 13 (62%)
  • False positive rounds: 3 (~14%)
  • Premature completion declarations: 3

Suggestions

1. Systematic Class-Fix Auditing

Pattern: When a reviewer identifies a systemic defect class (e.g., "verifier allows partial completion to pass"), the implementer fixes reported instances but doesn't audit all similar code.

Proposal: When a systemic defect class is identified, require the implementer to:

  1. Enumerate all code locations where the same pattern could occur
  2. Audit each location in the same round
  3. Report audit results in the round summary

Impact: Could have reduced 13 rounds to 2-3.

2. Premature Completion Guard

Pattern: Implementer declared acceptance criteria "complete" 3 times when they were not. Reviewer had to reopen each time.

Proposal: Before declaring an AC complete, require:

  1. List every sub-item within the criterion
  2. Provide specific test results per sub-item (not just "validators pass")
  3. Address any items the reviewer previously flagged

3. Change-Impact Validation

Pattern: Fixes sometimes introduced new bugs or over-corrected, requiring follow-up rounds to partially revert.

Proposal: When making changes to default values, constraint handling, or acceptance logic:

  1. Trace the change through all affected schema definitions and verifier logic
  2. Run negative tests for the specific change
  3. Cross-reference against instruction text

4. Post-Implementation Checklist

Pattern: Standard "new task" items (registry entries, frontend assets, seed exposure) were missed during initial implementation.

Proposal: Add a standard checklist when creating new tasks:

  • Registry updated?
  • Frontend assets included for all UI-dependent services?
  • Seed fixtures excluded from agent-writable paths?
  • Port numbers consistent between instructions and startup scripts?
  • All pre-existing tasks unaffected by shared infrastructure changes?

5. False Positive Reduction in Review

Pattern: ~14% of rounds produced no changes due to reviewer false positives (duplicate findings, static analysis that didn't hold in containers).

Proposal: Require reviewer to:

  1. Check whether reported issues were already fixed in previous rounds
  2. Validate findings against actual runtime behavior when possible
  3. Distinguish production issues from developer-experience improvements

6. BitLesson Activation Threshold

Pattern: A BitLesson was created mid-session but not proactively used to prevent the same pattern in subsequent rounds.

Proposal: When a BitLesson is created, the next round should include an explicit audit of all remaining code against the new lesson. Treat lessons as active checklist items, not just documentation.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions