fix: reset context when truncation deletes emitted segments by debermudez · Pull Request #1 · SemiAnalysisAI/aiperf

debermudez · 2026-06-05T21:17:11Z

Summary

Make truncate_synth_buf_at_block report disturbance for the three deletion paths that previously returned None (clear-all on target_blocks<=0, boundary cut with segments past the boundary and no in-place strip, and cut landing at the start of a segment).
Update the docstring to describe the new return contract: the helper now reports the earliest disturbed segment index in every case where prior content was removed or modified, and returns None only when the buffer was untouched.
Update two pre-existing tests that codified the buggy contract; add six new tests (helper-level + turn_delta integration) covering the three regression scenarios from the issue.

Without the fix, ConversationReconstructor.turn_delta() took the strict-append path with a stale _emitted_segment_count after a context pull-back, silently emitting an empty delta while the actual prompt context had been deleted. This corrupted Weka trace replay for turns with non-monotonic hash-id prefix relationships (LCP==0 or boundary truncation dropping later segments).

Refs SemiAnalysisAI/InferenceX#1665

Test plan

pytest tests/unit/dataset/loader/test_weka_synth_buf_turn_delta.py tests/unit/dataset/loader/test_weka_synth_buf.py — 60/60 pass
pytest tests/unit/dataset/loader/ -k weka — 275/275 pass (1 unrelated skip)
ruff check and ruff format --check clean on both files
RED-confirmed: the 6 new/updated tests fail on the unpatched code for the expected reasons (None instead of disturbance index)

Follow-up

Once this merges, the utils/aiperf submodule pointer in SemiAnalysisAI/InferenceX should be bumped in a separate PR so the fix lands in InferenceX (currently pinned at 062a5de).

Note

Medium Risk
Changes delta-encoding behavior for Weka trace replay when hash-id prefixes are non-monotonic; incorrect resets could affect KV-cache replay, but the fix aligns with the documented disturbance contract and is heavily tested.

Overview
Fixes Weka synth-buffer truncation so truncate_synth_buf_at_block reports the earliest disturbed segment index whenever truncation removes or changes prior content—not only when a surviving segment is edited in place.

Previously, three deletion paths returned None (full clear on target_blocks <= 0, boundary cut that dropped later segments with no partial-tail strip, cut exactly at a segment start). That let ConversationReconstructor.turn_delta() stay on strict append with a stale _emitted_segment_count after context pull-backs (e.g. LCP=0 or boundary truncation removing already-emitted segments), yielding empty or wrong deltas during trace replay.

The helper now returns 0 when clearing a non-empty buffer, the first deleted index when trailing segments are removed, and the segment index when the cut starts at segment i; None only when nothing was touched. Tests were updated and extended for the helper contract and turn_delta reset vs strict-append behavior.

^{Reviewed by Cursor Bugbot for commit 9bdc145. Bugbot is set up for automated code reviews on this repo. Configure here.}

truncate_synth_buf_at_block returned None for three deletion paths (clear-all on target_blocks<=0, boundary cut with segments past the boundary and no in-place strip, and cut landing at the start of a segment). turn_delta then took the strict-append path with a stale _emitted_segment_count, silently emitting an empty delta while the actual context had been pulled back. Now the helper reports the earliest disturbed index in all three cases; turn_delta resets context whenever the reported index lies within the emitted region. Refs SemiAnalysisAI/InferenceX#1665 Signed-off-by: Elias Bermudez <dbermudez@nvidia.com>

github-actions · 2026-06-05T21:17:19Z

Try out this PR

Quick install:

pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@9bdc145672126d7a9e9e31aab28d14102b74c574

Recommended with virtual environment (using uv):

uv venv --python 3.12 && source .venv/bin/activate
uv pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@9bdc145672126d7a9e9e31aab28d14102b74c574

Last updated for commit: 9bdc145 • Browse code

ajcasagrande

✅ Approve — addresses InferenceX#1665 as specified

Reviewed against the merge-base of the base branch cjq/agentx-v0.3 (not main). This is a complete, faithful fix for SemiAnalysisAI/InferenceX#1665: truncate_synth_buf_at_block now reports the earliest disturbed segment index on every deletion path, so ConversationReconstructor.turn_delta() correctly takes the reset path (instead of silently strict-appending stale state) when truncation pulls back previously-emitted segments.

Issue coverage — every "Expected behavior" point implemented:

target_blocks <= 0 on a non-empty buffer -> returns 0 ✓
boundary cut that deletes trailing segments (no in-place strip) -> returns the first deleted index ✓
cut at a segment start -> correct earliest-disturbed index ✓ (handled by the boundary path — see inline note)
genuine no-op -> still None ✓

All three regression tests the issue suggested are present and green (lcp==0 reset, boundary-delete reset, unemitted-only stays append-only). Verified locally: pytest tests/unit/dataset/loader/test_weka_synth_buf_turn_delta.py -> 20 passed.

Correctness is solid for every reachable path: the earliest-disturbed semantics line up exactly with turn_delta's _last_disturbance_at < _emitted_segment_count reset condition, and the i+1 boundary return is only ever consumed as a threshold (never as a list index), so it's safe.

One non-blocking nit inline (dead branch + a test passing through a different path than its name implies). It doesn't affect correctness or the issue fix — just a cleanup suggestion.

Nice work — exercising the real ConversationReconstructor API in the tests is exactly the right level here.

ajcasagrande · 2026-06-06T19:47:44Z

+            # all deleted. Earliest disturbed index is i.
            del segments[i:]
-            return None
+            return i


Non-blocking / cleanup. This branch is unreachable: a cut at a segment's start is always caught one iteration earlier by the boundary check (cursor + block_count == target_blocks), which deletes the trailing segments and returns the first-deleted index (== this branch's i). I confirmed with an exhaustive trace — 17,732 segment configs, these lines hit 0 times.

Side effect: test_truncate_returns_segment_index_when_cut_lands_at_segment_start actually passes via the boundary path's deleted_past_boundary -> i+1 logic, not this branch — so its name/docstring are misleading and this branch has no coverage.

The fix is still complete (the boundary path already satisfies issue #1665's bullet ai-dynamo#3). Suggest either dropping these lines and renaming that test to reflect the boundary path it exercises, or adding a comment that this branch is defensively unreachable.

debermudez changed the title ~~fix(weka): reset context when truncation deletes emitted segments~~ fix: reset context when truncation deletes emitted segments Jun 5, 2026

ajcasagrande approved these changes Jun 6, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: reset context when truncation deletes emitted segments#1

fix: reset context when truncation deletes emitted segments#1
debermudez wants to merge 1 commit into
SemiAnalysisAI:cjq/agentx-v0.3from
debermudez:eb/fix-truncate-synth-buf-disturbance

debermudez commented Jun 5, 2026 •

edited by cursor Bot

Loading

Uh oh!

github-actions Bot commented Jun 5, 2026

Uh oh!

ajcasagrande left a comment

Uh oh!

ajcasagrande Jun 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

debermudez commented Jun 5, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Follow-up

Uh oh!

github-actions Bot commented Jun 5, 2026

Try out this PR

Uh oh!

ajcasagrande left a comment

Choose a reason for hiding this comment

✅ Approve — addresses InferenceX#1665 as specified

Uh oh!

ajcasagrande Jun 6, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

debermudez commented Jun 5, 2026 •

edited by cursor Bot

Loading