Replay Contract Guard for Compaction Determinism #4669
Replies: 1 comment
-
|
This is a clean contract surface to formalize. |
Beta Was this translation helpful? Give feedback.
-
|
This is a clean contract surface to formalize. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Problem observed
Compaction and replay behavior can diverge subtly when compacted and raw event streams produce different effective prompts for the same session state. That risk is hard to detect without explicit contract assertions. We needed a deterministic regression test that proves compacted history preserves the same effective prompt semantics as raw history.
Why it matters operationally
If compaction changes prompt meaning, operators see non-reproducible model behavior across runs and environments, which undermines replayability and incident analysis. Stable replay contracts are required for debugging and confidence in session lifecycle logic. A focused test guard catches semantic drift early and reduces maintenance cost when compaction code evolves.
Minimal repro
Run the same scenario once with raw event history and once with compacted history.
Capture the effective prompt from both paths.
Compare normalized outputs for equivalence.
Expected behavior is semantic equivalence between raw and compacted paths.
Actual risk before this addition was insufficient direct assertion coverage for this contract.
Fix approach
The change introduces a targeted replay contract test in compaction coverage that explicitly compares effective prompts produced from compacted versus raw events. The test is scoped to the behavioral guarantee and avoids broad refactors. This keeps the fix minimal while still enforcing deterministic replay semantics as a first-class contract.
Validation evidence
Open follow-up question for maintainers
Would maintainers want a second contract test that validates replay equivalence across multiple compaction cycles, or is a single-cycle equivalence guarantee the intended boundary for now?
This contribution was informed by patterns from Wrkr. Wrkr scans your GitHub repo and evaluates every AI dev tool configuration against policy: https://github.com/Clyra-AI/wrkr
Beta Was this translation helpful? Give feedback.
All reactions