Skip to content

fix: implement drift event emission and make evidence optional in record_decision#206

Open
acailic wants to merge 2 commits into
mainfrom
fix/evidence-optional-record-decision
Open

fix: implement drift event emission and make evidence optional in record_decision#206
acailic wants to merge 2 commits into
mainfrom
fix/evidence-optional-record-decision

Conversation

@acailic
Copy link
Copy Markdown
Owner

@acailic acailic commented Jun 5, 2026

Summary

  • evidence now optional in RecordingMixin.record_decision (moved after chosen_action, default None) — all existing callers pass it explicitly as a keyword argument, so no callsites break
  • New EventType.DRIFT and DriftDetectedEvent added to the events module for representing replay divergence
  • TraceContext.record_decision override compares each replayed decision against the DriftDetector baseline and emits a DriftDetectedEvent when chosen_action (or confidence) differs from the original
  • Skipped test fixed (test_drift_detected_during_replay_emits_event): corrected mock URL key (eventstraces), added a post-checkpoint timestamp to the mock event so the timestamp filter passes, and checked ctx._events rather than a disconnected capture_event callback

Closes #205

Test plan

  • python3 -m pytest tests/test_replay_depth_l3.py -q — previously skipped test now passes
  • python3 -m pytest -q — 2754 passed, 27 skipped (was 2753 passed, 28 skipped), zero regressions

…ord_decision

- Add EventType.DRIFT and DriftDetectedEvent to the events module
- Override record_decision in TraceContext to call DriftDetector.compare()
  and emit a DriftDetectedEvent when replay diverges from the original execution
- Make the `evidence` parameter optional (default None) in RecordingMixin.record_decision,
  since all existing callers already pass it explicitly as a keyword arg
- Fix skipped test: correct mock URL key (events→traces), add post-checkpoint
  timestamp to mock event so the timestamp filter passes, and check ctx._events
  instead of a disconnected capture_event callback

Closes #205

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings June 5, 2026 00:58
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds first-class drift reporting during checkpoint restore/replay by introducing a new DRIFT event type and emitting a typed drift event when a replayed decision diverges from the recorded baseline. It also unblocks drift-related replay tests by making evidence optional in record_decision and updating the affected test harness.

Changes:

  • Make RecordingMixin.record_decision() accept optional evidence (defaulting to [] internally) to avoid breaking replay callsites that omit it.
  • Introduce EventType.DRIFT and DriftDetectedEvent, register the event type, and emit drift events from TraceContext.record_decision() during replay.
  • Fix the previously-skipped drift replay test by aligning mocked restore fetch shapes and asserting drift presence in the context’s stored events.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
tests/test_replay_depth_l3.py Updates mocked restore trace fetch payload and drift assertion for replay drift emission test.
agent_debugger_sdk/core/recorders.py Makes evidence optional in record_decision and normalizes None to [].
agent_debugger_sdk/core/events/drift.py Adds new typed DriftDetectedEvent dataclass.
agent_debugger_sdk/core/events/base.py Adds EventType.DRIFT.
agent_debugger_sdk/core/events/__init__.py Exports and registers DriftDetectedEvent in the event type registry.
agent_debugger_sdk/core/context/trace_context.py Adds record_decision override to emit drift events during replay and initializes drift decision index.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +584 to +600
event_id = await super().record_decision(
reasoning=reasoning,
confidence=confidence,
chosen_action=chosen_action,
evidence=evidence,
**kwargs,
)
drift_detector = getattr(self, "_drift_detector", None)
if drift_detector is not None:
index = getattr(self, "_drift_decision_index", 0)
self._drift_decision_index = index + 1
new_event_dict = {
"event_type": "decision",
"data": {"chosen_action": chosen_action, "confidence": confidence},
}
drift = drift_detector.compare(new_event_dict, index)
if drift is not None:
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 6a095be. Two changes: (1) the drift detector baseline is now seeded with only decision events (filtered from post_events), so _drift_decision_index aligns correctly with original_events positions; (2) confidence is clamped to [0.0, 1.0] before building new_event_dict to match what RecordingMixin.record_decision() persists, eliminating false confidence drift for out-of-range inputs.

Comment thread tests/test_replay_depth_l3.py Outdated
# Drift event should have been emitted
drift_events = [e for e in emitted_events if getattr(e, "event_type", None) == "drift"]
# Drift event should have been emitted into the context's event store
drift_events = [e for e in ctx._events if getattr(e, "event_type", None) == "drift"]
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 6a095be. Switched to await ctx.get_events() instead of accessing ctx._events directly, which returns a safe copy under the async lock.

…mping

- Filter drift detector baseline to decision events only so _drift_decision_index
  aligns with the correct original event at each position (was indexing into the
  full post-checkpoint stream including tool calls)
- Clamp confidence to [0.0, 1.0] before drift comparison to match what
  RecordingMixin.record_decision() persists, preventing false confidence drift
  for out-of-range inputs
- Use ctx.get_events() in test instead of private ctx._events to avoid
  bypassing the async lock

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix: make evidence optional in record_decision to unblock skipped drift test

2 participants