Add scenario=redteam to RAIServiceScorer eval_input by slister1001 · Pull Request #46701 · Azure/azure-sdk-for-python

slister1001 · 2026-05-04T19:21:26Z

Fix red team scoring for task_adherence via ContextDependent mapping

Problem

The RAIServiceScorer in _rai_scorer.py builds eval_input without scenario=redteam, so the server-side ContextDependent score mapping never triggers the Direct path for the red team scorer. This causes task_adherence to use Inverted mapping (flagged=true to 0.0) instead of Direct (flagged=true to 1.0), breaking is_attack_successful() which expects higher scores for detected defects (score > threshold).

Fix

Add scenario=redteam to the eval_input payload, aligning _rai_scorer.py with the existing pattern in _evaluation_processor.py:127. This ensures the server-side ContextDependent mapping correctly routes to Direct mapping for red team evaluations.

Changes

_rai_scorer.py: Add scenario=redteam to eval_input dict (1 line)
test_foundry.py: Add regression test test_score_async_sends_redteam_scenario asserting the scenario property is sent

Testing

Regression unit test passes
tox black passes

Copilot

Pull request overview

This PR fixes Foundry red team scoring requests by adding scenario="redteam" to the eval_input payload built in RAIServiceScorer._score_piece_async(), aligning it with the existing red team evaluation processor behavior so server-side context-dependent score mapping routes correctly.

Changes:

Add "scenario": "redteam" to the RAI service evaluation payload in the Foundry scorer.
Align Foundry red team scoring payload shape with the existing _evaluation_processor.py pattern to ensure correct server-side mapping behavior.

The RAIServiceScorer in _rai_scorer.py builds eval_input without scenario=redteam, so the server-side ContextDependent score mapping never triggers Direct for the red team scorer path. This causes task_adherence to use Inverted mapping (flagged=true -> 0.0) instead of Direct (flagged=true -> 1.0), breaking is_attack_successful() which expects higher scores for detected defects (score > threshold). Adding scenario=redteam aligns _rai_scorer.py with the existing pattern in _evaluation_processor.py:127 and ensures ContextDependent correctly routes to Direct mapping for red team evaluations. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Adds test_score_async_sends_redteam_scenario to verify the eval_input payload includes scenario=redteam, preventing future regressions where the server-side ContextDependent mapping would silently fall through to the wrong scoring path. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Re-recorded E2E tests after adding scenario=redteam to RAIServiceScorer eval_input. All 16 tests pass live and in playback. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

slister1001 · 2026-05-05T20:43:38Z

@copilot resolve the merge conflicts in this pull request

Copilot AI review requested due to automatic review settings May 4, 2026 19:21

slister1001 requested a review from a team as a code owner May 4, 2026 19:21

github-actions Bot added the Evaluation Issues related to the client library for Azure AI Evaluation label May 4, 2026

Copilot started reviewing on behalf of slister1001 May 4, 2026 19:23 View session

Copilot AI reviewed May 4, 2026

View reviewed changes

Comment thread sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/red_team/_foundry/_rai_scorer.py

Comment thread sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/red_team/_foundry/_rai_scorer.py

BryceByDesign approved these changes May 4, 2026

View reviewed changes

slister1001 force-pushed the sydneylister/rai-scorer-redteam-scenario branch 2 times, most recently from a5e1db6 to a26f3aa Compare May 5, 2026 19:30

slister1001 and others added 4 commits May 5, 2026 15:40

Update test recordings for scenario=redteam change

3c3d3b2

Re-recorded E2E tests after adding scenario=redteam to RAIServiceScorer eval_input. All 16 tests pass live and in playback. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Add changelog entry for task_adherence scoring fix

7121a6d

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

slister1001 force-pushed the sydneylister/rai-scorer-redteam-scenario branch from a26f3aa to 7121a6d Compare May 5, 2026 19:41

slister1001 marked this pull request as draft May 5, 2026 23:54

slister1001 marked this pull request as ready for review May 5, 2026 23:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add scenario=redteam to RAIServiceScorer eval_input#46701

Add scenario=redteam to RAIServiceScorer eval_input#46701
slister1001 wants to merge 4 commits intoAzure:mainfrom
slister1001:sydneylister/rai-scorer-redteam-scenario

slister1001 commented May 4, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

slister1001 commented May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

slister1001 commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Fix red team scoring for task_adherence via ContextDependent mapping

Problem

Fix

Changes

Testing

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

slister1001 commented May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

slister1001 commented May 4, 2026 •

edited

Loading