Skip to content

Add scenario=redteam to RAIServiceScorer eval_input#46701

Open
slister1001 wants to merge 4 commits intoAzure:mainfrom
slister1001:sydneylister/rai-scorer-redteam-scenario
Open

Add scenario=redteam to RAIServiceScorer eval_input#46701
slister1001 wants to merge 4 commits intoAzure:mainfrom
slister1001:sydneylister/rai-scorer-redteam-scenario

Conversation

@slister1001
Copy link
Copy Markdown
Member

@slister1001 slister1001 commented May 4, 2026

Fix red team scoring for task_adherence via ContextDependent mapping

Problem

The RAIServiceScorer in _rai_scorer.py builds eval_input without scenario=redteam, so the server-side ContextDependent score mapping never triggers the Direct path for the red team scorer. This causes task_adherence to use Inverted mapping (flagged=true to 0.0) instead of Direct (flagged=true to 1.0), breaking is_attack_successful() which expects higher scores for detected defects (score > threshold).

Fix

Add scenario=redteam to the eval_input payload, aligning _rai_scorer.py with the existing pattern in _evaluation_processor.py:127. This ensures the server-side ContextDependent mapping correctly routes to Direct mapping for red team evaluations.

Changes

  • _rai_scorer.py: Add scenario=redteam to eval_input dict (1 line)
  • test_foundry.py: Add regression test test_score_async_sends_redteam_scenario asserting the scenario property is sent

Testing

  • Regression unit test passes
  • tox black passes

Copilot AI review requested due to automatic review settings May 4, 2026 19:21
@slister1001 slister1001 requested a review from a team as a code owner May 4, 2026 19:21
@github-actions github-actions Bot added the Evaluation Issues related to the client library for Azure AI Evaluation label May 4, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes Foundry red team scoring requests by adding scenario="redteam" to the eval_input payload built in RAIServiceScorer._score_piece_async(), aligning it with the existing red team evaluation processor behavior so server-side context-dependent score mapping routes correctly.

Changes:

  • Add "scenario": "redteam" to the RAI service evaluation payload in the Foundry scorer.
  • Align Foundry red team scoring payload shape with the existing _evaluation_processor.py pattern to ensure correct server-side mapping behavior.

@slister1001 slister1001 force-pushed the sydneylister/rai-scorer-redteam-scenario branch 2 times, most recently from a5e1db6 to a26f3aa Compare May 5, 2026 19:30
slister1001 and others added 4 commits May 5, 2026 15:40
The RAIServiceScorer in _rai_scorer.py builds eval_input without
scenario=redteam, so the server-side ContextDependent score mapping
never triggers Direct for the red team scorer path. This causes
task_adherence to use Inverted mapping (flagged=true -> 0.0) instead
of Direct (flagged=true -> 1.0), breaking is_attack_successful()
which expects higher scores for detected defects (score > threshold).

Adding scenario=redteam aligns _rai_scorer.py with the existing
pattern in _evaluation_processor.py:127 and ensures ContextDependent
correctly routes to Direct mapping for red team evaluations.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adds test_score_async_sends_redteam_scenario to verify the eval_input
payload includes scenario=redteam, preventing future regressions where
the server-side ContextDependent mapping would silently fall through
to the wrong scoring path.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Re-recorded E2E tests after adding scenario=redteam to RAIServiceScorer
eval_input. All 16 tests pass live and in playback.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@slister1001 slister1001 force-pushed the sydneylister/rai-scorer-redteam-scenario branch from a26f3aa to 7121a6d Compare May 5, 2026 19:41
@slister1001
Copy link
Copy Markdown
Member Author

@copilot resolve the merge conflicts in this pull request

@slister1001 slister1001 marked this pull request as draft May 5, 2026 23:54
@slister1001 slister1001 marked this pull request as ready for review May 5, 2026 23:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Evaluation Issues related to the client library for Azure AI Evaluation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants