fix: implement drift event emission and make evidence optional in record_decision#206
fix: implement drift event emission and make evidence optional in record_decision#206acailic wants to merge 2 commits into
Conversation
…ord_decision - Add EventType.DRIFT and DriftDetectedEvent to the events module - Override record_decision in TraceContext to call DriftDetector.compare() and emit a DriftDetectedEvent when replay diverges from the original execution - Make the `evidence` parameter optional (default None) in RecordingMixin.record_decision, since all existing callers already pass it explicitly as a keyword arg - Fix skipped test: correct mock URL key (events→traces), add post-checkpoint timestamp to mock event so the timestamp filter passes, and check ctx._events instead of a disconnected capture_event callback Closes #205 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR adds first-class drift reporting during checkpoint restore/replay by introducing a new DRIFT event type and emitting a typed drift event when a replayed decision diverges from the recorded baseline. It also unblocks drift-related replay tests by making evidence optional in record_decision and updating the affected test harness.
Changes:
- Make
RecordingMixin.record_decision()accept optionalevidence(defaulting to[]internally) to avoid breaking replay callsites that omit it. - Introduce
EventType.DRIFTandDriftDetectedEvent, register the event type, and emit drift events fromTraceContext.record_decision()during replay. - Fix the previously-skipped drift replay test by aligning mocked restore fetch shapes and asserting drift presence in the context’s stored events.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
tests/test_replay_depth_l3.py |
Updates mocked restore trace fetch payload and drift assertion for replay drift emission test. |
agent_debugger_sdk/core/recorders.py |
Makes evidence optional in record_decision and normalizes None to []. |
agent_debugger_sdk/core/events/drift.py |
Adds new typed DriftDetectedEvent dataclass. |
agent_debugger_sdk/core/events/base.py |
Adds EventType.DRIFT. |
agent_debugger_sdk/core/events/__init__.py |
Exports and registers DriftDetectedEvent in the event type registry. |
agent_debugger_sdk/core/context/trace_context.py |
Adds record_decision override to emit drift events during replay and initializes drift decision index. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| event_id = await super().record_decision( | ||
| reasoning=reasoning, | ||
| confidence=confidence, | ||
| chosen_action=chosen_action, | ||
| evidence=evidence, | ||
| **kwargs, | ||
| ) | ||
| drift_detector = getattr(self, "_drift_detector", None) | ||
| if drift_detector is not None: | ||
| index = getattr(self, "_drift_decision_index", 0) | ||
| self._drift_decision_index = index + 1 | ||
| new_event_dict = { | ||
| "event_type": "decision", | ||
| "data": {"chosen_action": chosen_action, "confidence": confidence}, | ||
| } | ||
| drift = drift_detector.compare(new_event_dict, index) | ||
| if drift is not None: |
There was a problem hiding this comment.
Fixed in 6a095be. Two changes: (1) the drift detector baseline is now seeded with only decision events (filtered from post_events), so _drift_decision_index aligns correctly with original_events positions; (2) confidence is clamped to [0.0, 1.0] before building new_event_dict to match what RecordingMixin.record_decision() persists, eliminating false confidence drift for out-of-range inputs.
| # Drift event should have been emitted | ||
| drift_events = [e for e in emitted_events if getattr(e, "event_type", None) == "drift"] | ||
| # Drift event should have been emitted into the context's event store | ||
| drift_events = [e for e in ctx._events if getattr(e, "event_type", None) == "drift"] |
There was a problem hiding this comment.
Fixed in 6a095be. Switched to await ctx.get_events() instead of accessing ctx._events directly, which returns a safe copy under the async lock.
…mping - Filter drift detector baseline to decision events only so _drift_decision_index aligns with the correct original event at each position (was indexing into the full post-checkpoint stream including tool calls) - Clamp confidence to [0.0, 1.0] before drift comparison to match what RecordingMixin.record_decision() persists, preventing false confidence drift for out-of-range inputs - Use ctx.get_events() in test instead of private ctx._events to avoid bypassing the async lock Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Summary
evidencenow optional inRecordingMixin.record_decision(moved afterchosen_action, defaultNone) — all existing callers pass it explicitly as a keyword argument, so no callsites breakEventType.DRIFTandDriftDetectedEventadded to the events module for representing replay divergenceTraceContext.record_decisionoverride compares each replayed decision against theDriftDetectorbaseline and emits aDriftDetectedEventwhenchosen_action(or confidence) differs from the originaltest_drift_detected_during_replay_emits_event): corrected mock URL key (events→traces), added a post-checkpoint timestamp to the mock event so the timestamp filter passes, and checkedctx._eventsrather than a disconnectedcapture_eventcallbackCloses #205
Test plan
python3 -m pytest tests/test_replay_depth_l3.py -q— previously skipped test now passespython3 -m pytest -q— 2754 passed, 27 skipped (was 2753 passed, 28 skipped), zero regressions