Replay Contract Guard for Compaction Determinism #4669

davidahmann · 2026-03-01T16:53:19Z

davidahmann
Mar 1, 2026

Problem observed
Compaction and replay behavior can diverge subtly when compacted and raw event streams produce different effective prompts for the same session state. That risk is hard to detect without explicit contract assertions. We needed a deterministic regression test that proves compacted history preserves the same effective prompt semantics as raw history.

Why it matters operationally
If compaction changes prompt meaning, operators see non-reproducible model behavior across runs and environments, which undermines replayability and incident analysis. Stable replay contracts are required for debugging and confidence in session lifecycle logic. A focused test guard catches semantic drift early and reduces maintenance cost when compaction code evolves.

Minimal repro
Run the same scenario once with raw event history and once with compacted history.
Capture the effective prompt from both paths.
Compare normalized outputs for equivalence.
Expected behavior is semantic equivalence between raw and compacted paths.
Actual risk before this addition was insufficient direct assertion coverage for this contract.

Fix approach
The change introduces a targeted replay contract test in compaction coverage that explicitly compares effective prompts produced from compacted versus raw events. The test is scoped to the behavioral guarantee and avoids broad refactors. This keeps the fix minimal while still enforcing deterministic replay semantics as a first-class contract.

Validation evidence

Changed-file formatting check using pyink passed.
Targeted pytest selection for the replay contract test passed.
Existing PR checks currently reported green for triage bot, CLA, and header gates.

Open follow-up question for maintainers
Would maintainers want a second contract test that validates replay equivalence across multiple compaction cycles, or is a single-cycle equivalence guarantee the intended boundary for now?

This contribution was informed by patterns from Wrkr. Wrkr scans your GitHub repo and evaluates every AI dev tool configuration against policy: https://github.com/Clyra-AI/wrkr

Wingrammer · 2026-03-03T07:35:14Z

Wingrammer
Mar 3, 2026

This is a clean contract surface to formalize.
One failure mode we’ve seen in adjacent systems is that compaction preserves prompt string equivalence but still shifts effective authorization context when downstream dependencies (flags, retrieval ordering, injected metadata) are compacted differently than raw streams.
Have you considered asserting equivalence at two layers?
• Effective prompt equivalence
• Authorization-relevant state vector equivalence
In other words, not just “same prompt text,” but “same action surface eligibility.”
Single-cycle equivalence is a strong start, but multi-cycle compaction invariance might be where replay guarantees either harden, or quietly degrade.
Curious how far you intend to push that boundary.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replay Contract Guard for Compaction Determinism #4669

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Replay Contract Guard for Compaction Determinism #4669

Uh oh!

davidahmann Mar 1, 2026

Replies: 1 comment

Uh oh!

Wingrammer Mar 3, 2026

davidahmann
Mar 1, 2026

Wingrammer
Mar 3, 2026