Skip to content

Conversation

@aprilk-ms
Copy link
Member

Description

Please add an informative description that covers that changes made by the pull request and link all relevant issues.

If an SDK is being regenerated based on a new API spec, a link to the pull request containing these API spec changes should be included above.

All SDK Contribution checklist:

  • The pull request does not introduce [breaking changes]
  • CHANGELOG is updated for new features, bug fixes or other significant changes.
  • I have read the contribution guidelines.

General Guidelines and Best Practices

  • Title of the pull request is clear and informative.
  • There are a small number of commits, each of which have an informative message. This means that previously merged commits do not appear in the history of the PR. For more information on cleaning up the commits in your PR, see this page.

Testing Guidelines

  • Pull request includes test coverage for the included changes.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request adds comprehensive test coverage for evaluation samples (set 2), expanding from 9 to 25 tested samples. The PR introduces new test methods for agentic evaluator samples and improves the sample execution framework to handle samples with main() functions and ensure deterministic test playback.

Changes:

  • Added 15 agentic evaluator sample tests and included the previously excluded sample_redteam_evaluations.py sample
  • Enhanced sample executor to conditionally call main() functions in sync samples and patch time functions for deterministic playback
  • Added timestamp sanitization for evaluation names to ensure consistent test recordings

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
test_samples_evaluations.py Added two new test methods to cover 15 agentic evaluator samples plus the generic agentic evaluator sample; updated documentation to reflect 25 total samples; included sample_redteam_evaluations.py in tests
sample_executor.py Added time function patching for deterministic playback; added conditional main() execution for sync samples; added PLAYBACK_TIMESTAMP constant for consistent timestamp mocking
conftest.py Added regex sanitizer for Unix timestamps in evaluation names to prevent recording mismatches
assets.json Updated tag reference to include new test recordings

@aprilk-ms aprilk-ms merged commit 84905e8 into main Jan 23, 2026
20 checks passed
@aprilk-ms aprilk-ms deleted the aprilk/more-evals-recording-tests branch January 23, 2026 18:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants