Added more evaluation sample tests (set 2) #44812

aprilk-ms · 2026-01-23T00:41:54Z

Description

Please add an informative description that covers that changes made by the pull request and link all relevant issues.

If an SDK is being regenerated based on a new API spec, a link to the pull request containing these API spec changes should be included above.

All SDK Contribution checklist:

The pull request does not introduce [breaking changes]
CHANGELOG is updated for new features, bug fixes or other significant changes.
I have read the contribution guidelines.

General Guidelines and Best Practices

Title of the pull request is clear and informative.
There are a small number of commits, each of which have an informative message. This means that previously merged commits do not appear in the history of the PR. For more information on cleaning up the commits in your PR, see this page.

Testing Guidelines

Pull request includes test coverage for the included changes.

Copilot

Pull request overview

This pull request adds comprehensive test coverage for evaluation samples (set 2), expanding from 9 to 25 tested samples. The PR introduces new test methods for agentic evaluator samples and improves the sample execution framework to handle samples with main() functions and ensure deterministic test playback.

Changes:

Added 15 agentic evaluator sample tests and included the previously excluded sample_redteam_evaluations.py sample
Enhanced sample executor to conditionally call main() functions in sync samples and patch time functions for deterministic playback
Added timestamp sanitization for evaluation names to ensure consistent test recordings

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File	Description
test_samples_evaluations.py	Added two new test methods to cover 15 agentic evaluator samples plus the generic agentic evaluator sample; updated documentation to reflect 25 total samples; included `sample_redteam_evaluations.py` in tests
sample_executor.py	Added time function patching for deterministic playback; added conditional `main()` execution for sync samples; added `PLAYBACK_TIMESTAMP` constant for consistent timestamp mocking
conftest.py	Added regex sanitizer for Unix timestamps in evaluation names to prevent recording mismatches
assets.json	Updated tag reference to include new test recordings

sdk/ai/azure-ai-projects/tests/samples/sample_executor.py

sdk/ai/azure-ai-projects/tests/conftest.py

…amp regex

sdk/ai/azure-ai-projects/tests/conftest.py

Added more eval sample tests

05786e6

aprilk-ms requested review from dargilco, glharper, howieleung, kingernupur, nick863, trangevi and trrwilson as code owners January 23, 2026 00:41

Copilot AI review requested due to automatic review settings January 23, 2026 00:41

Copilot started reviewing on behalf of aprilk-ms January 23, 2026 00:42 View session

github-actions bot added the AI Projects label Jan 23, 2026

Copilot AI reviewed Jan 23, 2026

View reviewed changes

sdk/ai/azure-ai-projects/tests/samples/sample_executor.py Outdated Show resolved Hide resolved

sdk/ai/azure-ai-projects/tests/conftest.py Outdated Show resolved Hide resolved

Address PR review comments: async executor consistency, robust timest…

7afdcbb

…amp regex

howieleung reviewed Jan 23, 2026

View reviewed changes

sdk/ai/azure-ai-projects/tests/conftest.py Show resolved Hide resolved

aprilk-ms and others added 2 commits January 22, 2026 21:51

Use specific regex patterns for timestamp sanitization

39ba3b3

Merge branch 'main' into aprilk/more-evals-recording-tests

e5f1e7a

howieleung approved these changes Jan 23, 2026

View reviewed changes

aprilk-ms merged commit 84905e8 into main Jan 23, 2026
20 checks passed

aprilk-ms deleted the aprilk/more-evals-recording-tests branch January 23, 2026 18:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added more evaluation sample tests (set 2) #44812

Added more evaluation sample tests (set 2) #44812

aprilk-ms commented Jan 23, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Added more evaluation sample tests (set 2) #44812

Added more evaluation sample tests (set 2) #44812

Conversation

aprilk-ms commented Jan 23, 2026

Description

All SDK Contribution checklist:

General Guidelines and Best Practices

Testing Guidelines

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants