FEAT: Adversarial Benchmark Scenario Refactor by ValbuenaVC · Pull Request #1765 · microsoft/PyRIT

ValbuenaVC · 2026-05-20T20:24:30Z

Depends on #1758 — cross-run caching reads AttackResultEntry.attribution_data.parent_eval_hash and AtomicAttack.technique_eval_hash, both introduced there.

Substantial refactor of the adversarial benchmark scenario (AdversarialBenchmark). The goal is to make AdversarialBenchmark more consistent with other scenarios, more performant, and integrate better with research workflows. Key changes below.

TL;DR

BenchmarkInitializer handles adversarial model registration and technique fanout instead of AdversarialBenchmark.
New adversarial models prefixed with ADVERSARIAL* added to TargetRegistry and the .env_example file.
New tests for consistency.
Shrunk AdversarialBenchmark so it looks more like RapidResponse.
Added benchmark caching, which in a later PR should be expanded to the entire Scenario class since it's not truly benchmark-specific.

In Detail

New BenchmarkInitializer bootstraps a benchmarking trial and absorbs the model-fanout logic that used to live inside AdversarialBenchmark. It queries TargetRegistry for adversarial models via the new TagQuery.all("adversarial") selector, then fans the adversarial-capable scenario techniques out across each discovered model and registers the resulting variants into AttackTechniqueRegistry tagged ["benchmark_fanout", f"model:{target_name}"]. The two-stage flow is: TargetInitializer auto-registers the env-driven adversarial endpoints (including the new variants below); BenchmarkInitializer reads them via TargetRegistry.get_by_tag_query(...) and registers the per-model fanned specs. .pyrit_conf's scenario→initializer map gains a benchmark entry alongside the existing defaults. Placed at pyrit/setup/initializers/benchmark.py (top level, alongside AIRTInitializer/SimpleInitializer) — workflow profile, not a registry-populating building block.
AdversarialBenchmark is collapsed to roughly the shape of RapidResponse. Drops from 297 to 264 lines. With adversarial targets now discoverable from the registry, the scenario no longer takes an adversarial_models constructor parameter, no longer overrides _get_atomic_attacks_async to construct factories or inject targets, no longer infers model names from atomic-attack strings (the entire _infer_labels dedupe/collision-suffix loop is gone), and inherits the base implementation's atomic-attack construction loop. Display grouping now extracts the target label from the fanned f"src__target" name so per-model ASR rolls up naturally. The only remaining override is a thin caching wrapper (see next bullet).
Cross-run caching for scenario resumption. New skip_cached: bool = False constructor parameter. The cache key is (atomic_attack_name, technique_eval_hash), where technique_eval_hash comes from AtomicAttackEvaluationIdentifier, per the contract introduced in FEAT: Better Scenario Tracking #1758. Only SUCCESS and FAILURE outcomes are cached; ERROR and UNDETERMINED always retry. AdversarialBenchmark.VERSION is bumped from 1 → 2 because the new atomic_attack_name format ({technique}__{model}_{dataset}) invalidates prior runs for cache-matching purposes. Old results remain queryable via memory.get_scenario_results(scenario_version=1) but won't suppress fresh runs.
Scorer-type flexibility (stage 1). The objective_scorer parameter annotation is widened to Scorer with an early isinstance(TrueFalseScorer) guard that raises a clear TypeError pointing at the new-scoring follow-up. Full non-TF scorer support (widening the underlying AttackScoringConfig and atomic-attack types) is tracked separately.
TargetInitializer propagates config.tags to registry entries. Note that this is a separate pre-existing bug. This is a 1-line fix to existing code: today config.tags are silently dropped during _register_target, which means the entire TargetInitializerTags enum (ADVERSARIAL, SCORER, etc.) is inert. After this fix, all env-driven targets carry their declared tags, which is what makes TargetRegistry.get_by_tag_query(...) viable end-to-end. This change affects every env-driven target, not just adversarial ones, so it may be breaking. As a paired hardening, _register_target now also skips with a warning when a TargetConfig declares model_var but the env var is unset — without this guard, OpenAIChatTarget silently falls back to the global OPENAI_CHAT_MODEL and sends requests to the wrong model.
New adversarial chat variants have been added in .env_example — ADVERSARIAL_CHAT_SINGLETURN_*, ADVERSARIAL_CHAT_MULTITURN_*, ADVERSARIAL_CHAT_REASONING_*. They are auto-registered by TargetInitializer with tags=[DEFAULT, ADVERSARIAL] so they appear in the registry out of the box and are picked up by BenchmarkInitializer's default tag query. Missing env vars skip gracefully, same as today's adversarial_chat. Note that the tag pattern is "ADVERSARIAL".
Supporting changes.
- New TargetRegistry.get_by_tag_query(*, query: TagQuery) accessor on the registry (consumed by BenchmarkInitializer; documented as keys-only matching, ignoring tag values).
- The benchmark scenario entry in tests/end_to_end/test_scenarios.py runs the new initializer via a per-scenario initializer override map (SCENARIO_INITIALIZERS). Key is the dotted ScenarioRegistry name benchmark.adversarial, not adversarial_benchmark.
- doc/scanner/benchmark.{py,ipynb} is rewritten around the env-driven quickstart, registry-based multi-model fan-out, cross-run caching example, target_names narrowing, .pyrit_conf bootstrap snippet, and scorer-flex forward-looking section. Both .py and .ipynb committed pre-execution; the published doc page at https://microsoft.github.io/PyRIT/scanner/benchmark/ will update automatically when this merges into main. Maintainers running pct_to_ipynb.py before release will re-execute against real endpoints.
- The tests/unit/scenario/ tree was reorganized to mirror source layout (benchmark/, airt/, foundry/, garak/, core/) in two prep commits before the main refactor.

Tests

~40 net new unit tests across 4 files. No e2e run included (e2e tests require live API credentials and are not in CI).

File	Tests	Coverage
`tests/unit/registry/test_target_registry.py`	13 → 18	`get_by_tag_query` (4: matching, empty, composite `&`/`\|`, keys-not-values); duplicate-name characterization (1)
`tests/unit/setup/test_targets_initializer.py`	19 → 40	`config.tags` propagation (3); unique `registry_name` guardrail (1); double-init idempotency (1); ADVERSARIAL_CHAT variants register-with-tags / skip-when-missing / skip-when-MODEL-missing (9 parameterized); end-to-end tag-query discovery (1); existing tests retained
`tests/unit/setup/test_benchmark_initializer.py`	0 → 12	New file. Basic metadata (4); fan-out behavior (4); `target_names` narrowing (3); empty-discovery error message (1)
`tests/unit/scenario/benchmark/test_adversarial.py`	rewrite	Metadata (5); strategy enum built from registry (8); collapsed init (3); display grouping (3); `skip_cached` filter behavior (9 covering filter, ERROR/UNDETERMINED retry, eval-hash disambiguation, scenario-state filter, query-arg shape, missing-attribution defense, memory-error defense); scorer-flex (3)

Wider regression: 1652/1652 pass across tests/unit/{scenario,setup,registry,backend}.

Failure modes flagged for follow-up

Audited during this PR but deliberately not fixed here — each is either pre-existing in non-PR scope, a separable enhancement, or doc work that should land in its own PR.

Severity	Title	Surfaced in	Notes
High	Missing `_MODEL` env var → silent fallback to global `OPENAI_CHAT_MODEL`	Commit 3	Fixed for env-driven targets in Commit 3 via skip-with-warning; affects every `OpenAIChatTarget` config (12+ entries). A broader pattern review may be warranted.
Medium	`BaseInstanceRegistry.register` silently overwrites on duplicate name	Commit 3	Existing footgun; new env-driven adversarial variants increase exposure surface. Proposed fix: warn-on-collision at base, or behind `strict=True`.
Medium	`BenchmarkInitializer` fanout holds stale `TargetRegistry` references after later mutations	Commit 4	Discovery is a snapshot at init time. Re-running `TargetInitializer` with a narrower filter leaves orphan factories holding dead target instances; surfaces at API-call time, not init. P1f territory.
Low	`AttackTechniqueRegistry.register_from_specs` is first-write-wins with no log entry	Commit 4	Disjoint name spaces between `ScenarioTechniqueInitializer` and `BenchmarkInitializer` mean this is inert today; future extensions producing colliding names would be silently no-op'd. Proposed: debug-level log on skip.
Low	`registry_name` has no format validation	Commit 3	Env-driven entries are clean snake_case; risk grows when per-user `TargetConfig` support lands in P1. Proposed: snake_case validator or CONTRIBUTING note.
Low	Initializer placement convention (top-level vs `components/`) is implicit	Commit 4	This PR picks top-level for `BenchmarkInitializer` matching `AIRTInitializer` / `SimpleInitializer` (workflow-profile pattern); `components/` reserved for auto-bundled building blocks. Worth a CONTRIBUTING note.
Low (enhancement)	`skip_cached` override probably belongs on base `Scenario`, duck-typed per subclass	Commit 6	The `_get_atomic_attacks_async` + `_collect_cached_completion_pairs` shape is scenario-agnostic. Sketch: classmethod hook like `cls.cache_scope_name()` on base `Scenario` so other scenarios (`RapidResponse`, `Scam`, …) can opt in without copy-pasting the wrapper.

Follow-up items (separate PRs)

Analytics integration. The goal is to make benchmarking results more easily queryable, and pyrit.analytics seems like the natural injection point. Proposed new surface: pyrit.analytics.load_scenario_results, analyze_scenario_results, ScenarioResult.get_asr_grouped, ScenarioResult.technique_map, AtomicAttack.technique_name, and the objective_achieved_rate int → float fix + pretty.py denominator/color corrections. Tracked separately so this PR stays focused on the architecture and caching path.
New scoring logic. The adversarial benchmarking scenario currently just measures ASR. This is a limited metric and makes implicit assumptions about the SelfAskRefusalScorer that the user should be able to override depending on what aspect of the adversarial models they wish to benchmark. TBD as it may involve adding new scorers and scoring composition logic that would be out of scope. Commit 7 widens the objective_scorer annotation ahead of this work so the follow-up doesn't require a signature change.
Caching integration test (tests/integration/scenario/test_adversarial_benchmark_caching.py) — full round-trip with MockPromptTarget + sqlite memory fixture; covers the persistence path that unit tests can't. Deferred per plan F6.3.
Scanner notebook integration test — smoke-run the rewritten doc/scanner/benchmark.{py,ipynb} with mocked targets. Deferred per plan F6.2.

Downstream notes

Published doc page https://microsoft.github.io/PyRIT/scanner/benchmark/ will update automatically when this PR merges into main (Jupyter-Book builds from doc/scanner/benchmark.py).
doc/scanner/benchmark.{py,ipynb} is committed pre-execution per the contributing guide — maintainers regenerate with pct_to_ipynb.py against real endpoints before release.
AdversarialBenchmark.VERSION bump (1 → 2) is documented in the PR description rather than a CHANGELOG file (none exists in the repo). Existing callers of memory.get_scenario_results() filtering on scenario_version will need to update.

Adds three TargetConfig entries (singleturn, multiturn, reasoning), each tagged [DEFAULT, ADVERSARIAL], for the env-driven variants already declared in .env_example. Tightens _register_target to skip with a warning when a TargetConfig declares model_var but the env var is unset; without this guard the target silently falls back to the global OPENAI_CHAT_MODEL default and sends requests to the wrong model. New tests covering naming-related failure modes flagged for the eventual PR review: - test_register_instance_with_duplicate_name_silently_overwrites pins the current "second write wins" behavior so future hardening (warn / raise / idempotent skip) is intentional. - test_target_configs_have_unique_registry_names guards against typos in ENV_TARGET_CONFIGS that would otherwise silently drop a target. - test_double_initialize_async_is_idempotent regression-guards the re-init path that depends on the silent-overwrite semantics above. - test_variant_skips_when_model_env_var_missing parameterizes the missing-_MODEL skip+warning for all three new variants. Failure modes surfaced during this change but not addressed here (tracked for the PR description batch): - Duplicate registry_name silently overwrites in BaseInstanceRegistry. - registry_name has no format validation; risk grows with per-user TargetConfig support in P1. - No-adversarial-models-found error message UX is owned by the upcoming BenchmarkInitializer commit and needs a clear, actionable message.

Registers one AttackTechniqueSpec variant per (adversarial-capable technique, adversarial-tagged target) pair into AttackTechniqueRegistry with the live target bound onto adversarial_chat. Variants are named f"{source}__{target_name}" and tagged ["benchmark_fanout", f"model:{target_name}"] so the benchmark scenario can discover them via tag query in a later commit. Adversarial-capability is determined by reusing _spec_needs_adversarial from scenario_techniques (multi-turn attacks + crescendo-style simulated conversations). Single-turn techniques without an adversarial chat target (prompt_sending, role_play, many_shot, context_compliance) are not fanned — the benchmark holds the objective target constant and varies the adversarial chat helper across runs. Placed at pyrit/setup/initializers/benchmark.py (top level) alongside AIRTInitializer and SimpleInitializer, not under components/. The components/ initializers (TargetInitializer, ScorerInitializer, ScenarioTechniqueInitializer) are auto-bundled building blocks that populate their registries during every PyRIT setup. BenchmarkInitializer is the opposite shape: a user-opted workflow profile named after the use case, listed in .pyrit_conf when the user wants a benchmarking trial. The placement convention is itself underspecified in the codebase and is tracked for follow-up. Parameter contract (target_names: list[str] | None): The optional target_names parameter is declared via PyRITInitializer.supported_parameters, which is the single source of truth shared across three consumer sites: 1. .pyrit_conf YAML: initializers: - name: benchmark args: target_names: - adversarial_chat_singleturn - adversarial_chat_reasoning ConfigurationLoader._resolve_initializers calls instance.set_params_from_args(args=config.args) and then _validate_params against supported_parameters, so unknown keys fail fast at config-load time. Omitting the args block uses the default (fan over every adversarial-tagged target). 2. CLI (--list-initializers via frontend_core._print_initializer_meta): reads metadata.supported_parameters and prints name + description + default for each declared parameter, so users discover what they can put in .pyrit_conf without reading source. 3. GUI backend (InitializerService): wraps each declared parameter as an InitializerParameterSummary({name, description, default}) on the RegisteredInitializer Pydantic model. The GUI renders form fields from this metadata. All three paths terminate at the same self.params dict that initialize_async reads via self.params.get("target_names"), so adding, renaming, or retyping the parameter is a single-site change. target_names narrows fan-out to a subset of adversarial targets by registry name; unknown names raise ValueError listing both the unknowns and the discovered set. Empty discovery raises ValueError naming the ADVERSARIAL_CHAT_* env vars and the TargetInitializer ordering dependency (closes one of the failure-mode follow-ups surfaced in the previous commit). Failure modes audited during this change but not addressed here (tracked for the PR description batch): - AttackTechniqueRegistry.register_from_specs is first-write-wins on name collision with no log entry. Disjoint name spaces between ScenarioTechniqueInitializer and BenchmarkInitializer mean this is inert today; future extensions that produce colliding names would be silently no-op'd. - BenchmarkInitializer's TargetRegistry walk is a snapshot at init time; later mutations to TargetRegistry leave the fanned specs holding stale references. Failure surfaces at API-call time, not at registration. - Top-level vs components/ initializer placement convention is implicit; this commit picks "top-level for workflow profiles", matching AIRT/Simple. Worth a CONTRIBUTING note when convention is formalized.

…_refactor Updating branch off fork to include latest commits.

@staticmethods

Removes the local factory-construction override and the adversarial_models constructor parameter. AdversarialBenchmark now inherits the base Scenario._get_atomic_attacks_async loop and reads its strategy enum from AttackTechniqueRegistry entries tagged benchmark_fanout (registered by BenchmarkInitializer in the previous commit). What's removed: - adversarial_models: list[PromptTarget] constructor param + validation. - _adversarial_configs dict construction in __init__. - _get_atomic_attacks_async override that built local factories, iterated models x techniques x datasets, and injected attack_adversarial_config_override at create-time. - _infer_labels static method + the entire dedupe/collision-suffix loop that inferred model labels from target identifiers - replaced by TargetConfig.registry_name as the canonical label (set explicitly in ENV_TARGET_CONFIGS, no inference needed). - _get_benchmarkable_specs and _build_benchmark_strategy as @staticmethods on the class - replaced by a module-level _build_benchmark_strategy function. Strategy-class construction never reads scenario instance state, so the function does not belong to the class; module-level placement makes the dependency (only the registry) explicit and the unit-test surface flat. What's added: - BENCHMARK_FANOUT_TAG module constant (= "benchmark_fanout") as the shared contract between BenchmarkInitializer (writes the tag) and AdversarialBenchmark (reads it). - _StrategyOnlyMarker sentinel class to satisfy the required AttackTechniqueSpec.attack_class field when reconstructing minimal specs for strategy-enum construction. build_strategy_class_from_specs reads only name + strategy_tags, so the sentinel never reaches a runtime construction site; the real factory is fetched by name from the registry at attack-execution time. - _build_display_group override: extracts the target label from the fanned f"src__target" technique name so display rolls up per-model. Falls back to the full name when no __ separator is present. Where the (technique x target x dataset) permutation now happens: The pre-collapse override did all three dimensions at scenario runtime in one nested loop. Post-collapse the permutation is split across two stages, owned by different layers: 1. Initializer time - BenchmarkInitializer.initialize_async runs the (technique x adversarial-target) cross-product and registers one fanned AttackTechniqueFactory per pair into AttackTechniqueRegistry, tagged benchmark_fanout. Target binding lives on the factory. 2. Scenario runtime - Scenario._get_atomic_attacks_async (inherited, base class) runs the (fanned-variant x dataset) cross-product, building one AtomicAttack per pair. The target dimension is already resolved on the factory at this point. Net atomic-attack count is unchanged for the same inputs; the change is which layer owns which dimension. See the AdversarialBenchmark class docstring for the full explanation. VERSION bump 1 -> 2: The atomic_attack_name format changes from f"{technique}__{model}__{dataset}" (triple-segment, old override-driven) to f"{technique}__{model}_{dataset}" (double-then- single-underscore, base-inherited). Cached results from VERSION=1 remain queryable via memory.get_scenario_results(scenario_version=1) but won't suppress fresh runs with skip_cached=True (the param itself lands in the next commit). No CHANGELOG file in this repo; this note will land in the PR description. Doc notebook (doc/scanner/benchmark.{py,ipynb}) still references the removed adversarial_models API and will fail at runtime until Commit 9 rewrites it; deferred per plan F7. Not gated by any unit test. Tests rewritten end-to-end (619 -> 268 lines): see test_adversarial.py for the four test classes covering metadata, strategy construction, collapsed init surface, and display grouping. Wider regression: 1091/1091 pass across scenario+setup+registry; 547/547 pass in backend.

Adds a skip_cached: bool = False constructor parameter and a thin _get_atomic_attacks_async override on AdversarialBenchmark that, when enabled, filters out atomic-attack candidates whose (atomic_attack_name, technique_eval_hash) tuple appears in any prior COMPLETED ScenarioResult for the same scenario name + VERSION with outcome SUCCESS or FAILURE. ERROR and UNDETERMINED outcomes always retry. Caching is off by default to preserve existing behavior. Built on the AttackResultAttribution primitives introduced in microsoft#1758: - AtomicAttack.technique_eval_hash provides the candidate side of the cache key (content-derived via AtomicAttackEvaluationIdentifier). - AttackResultEntry.attribution_data['parent_collection' + 'parent_eval_hash'] provides the persisted side; the executor stamps these per AttackResult, so two atomic attacks sharing a name but using different technique configurations don't cross-pollinate. Defensive behavior: - Missing attribution_data or missing parent_collection -> skip the row silently (treat as not-cached). - Memory exceptions from get_scenario_results / get_attack_results -> log a warning and fall back to no filtering. Caching becomes a no-op rather than blocking the run. - Scenarios in IN_PROGRESS / FAILED / CANCELLED state contribute nothing (no get_attack_results query made for them at all). - Scenario name is matched on type(self).__name__ (PascalCase "AdversarialBenchmark"), aligned with how ScenarioIdentifier stores it; VERSION filter ensures the VERSION bump in the previous commit invalidates old VERSION=1 results for cache purposes (they remain queryable; they just don't suppress fresh runs). Tests: 11 new unit tests (TestAdversarialBenchmarkSkipCachedFilter + TestAdversarialBenchmarkSkipCachedInit) covering filtering semantics, outcome filters, eval-hash disambiguation, scenario-state filter, query-arg shape, missing-attribution defense, memory-error defense, and constructor defaults. Integration test with full persistence round-trip is a separate follow-up commit (F6.3 per plan). Wider regression: 1649/1649 pass across scenario+setup+registry+ backend. Failure mode flagged for the PR description batch: - The override + helper are scenario-agnostic in shape and should probably live on base Scenario behind a duck-typed identity hook (e.g. cls.cache_scope_name() classmethod) so other scenarios (RapidResponse, Scam, etc.) can opt into skip_cached without copy-pasting the wrapper. Enhancement, not a bug; tracked as lift-skip-cached-to-base-scenario.

Stage 1 of the scorer-flexibility refactor: widens the parameter annotation on AdversarialBenchmark.__init__ from TrueFalseScorer | None to Scorer | None, while preserving the existing runtime contract via an isinstance(resolved, TrueFalseScorer) guard that raises TypeError with a pointer at the new-scoring follow-up. Forward-compatible: when stage 2 lands and AttackScoringConfig + atomic-attack types are widened to Scorer, removing the guard is the only change needed here. Why widen the annotation now rather than wait for stage 2: - Lets the follow-up PR be a behavior change (drop the guard, wire the new scorer path through AttackScoringConfig) without a parameter signature change. Users coding to AdversarialBenchmark.__init__'s signature see the eventual contract today. - Self-documents the planned direction in IDE tooling and --list-scenarios output. - TypeError message names the constraint AND points readers at the follow-up so the broken case isn't silent. Out of scope (stage 2, separate follow-up): - AttackScoringConfig.objective_scorer widening - Atomic attack types' objective_scorer widening - pyrit.scenario.core.scenario casts at lines :778, :990, :1034 - Removing this guard Tests: 3 new (test_objective_scorer_annotation_is_scorer, test_construct_accepts_truefalse_scorer_subclass, test_non_truefalse_scorer_raises_typeerror_with_pointer). Existing default-scorer / explicit-scorer init tests already cover the happy TrueFalseScorer path. Wider regression: 1652/1652 pass across scenario+setup+registry+ backend. Pre-commit clean.

Adds DEFAULT_INITIALIZERS + SCENARIO_INITIALIZERS to tests/end_to_end/test_scenarios.py so scenarios that need scenario- specific initialization (post-collapse benchmark.adversarial needs BenchmarkInitializer to fan adversarial techniques across registry- discovered targets) can opt into a longer initializer list without forcing every other scenario to load the same extras. Default for every scenario: ["target", "load_default_datasets"] (unchanged from prior behavior). Override for benchmark.adversarial: defaults + ["benchmark"], so BenchmarkInitializer runs after TargetInitializer has populated TargetRegistry with the ADVERSARIAL-tagged env-driven targets. Plan-vs-reality fix caught during implementation: the plan referred to the scenario key as "adversarial_benchmark", but the actual ScenarioRegistry name (used by pyrit_scan) is the dotted module path "benchmark.adversarial", mirroring "airt.cyber" / "garak.encoding". The override map uses the dotted form. Comment in the file pins the convention so future overrides don't hit the same gotcha. E2e tests are not part of CI; they run via make end-to-end-test on developer machines that have ADVERSARIAL_CHAT_* env vars set. When the env vars are absent, BenchmarkInitializer surfaces the actionable error message added in Commit 4 (closes failure_mode_followup no-adversarial-model-clear-error, also referenced in Commit 4 body). No regression run included: e2e tests require live API credentials. Smoke-tested with pytest --collect-only (9 scenarios, including benchmark.adversarial) and a manual resolution check that _initializers_for("benchmark.adversarial") returns the override list while _initializers_for("airt.cyber") falls back to defaults.

Rewrites doc/scanner/benchmark.{py,ipynb} end-to-end around the new registry-driven flow. The previous notebook constructed AdversarialBenchmark with adversarial_models=[OpenAIChatTarget()], which no longer exists after the collapse. New notebook content: - Prerequisites: ADVERSARIAL_CHAT_* env vars (plus optional _SINGLETURN / _MULTITURN / _REASONING variants). - CLI quickstart: pyrit_scan benchmark.adversarial --initializers target load_default_datasets benchmark --target openai_chat ... - Setup cell: initialize_pyrit_async with TargetInitializer + ScorerInitializer + LoadDefaultDatasets + BenchmarkInitializer (in that order, since BenchmarkInitializer reads TargetRegistry). - Run cell: AdversarialBenchmark() with no model args; default "light" strategy. - Cross-run caching cell: AdversarialBenchmark(skip_cached=True); documents (atomic_attack_name, technique_eval_hash) cache key, SUCCESS/FAILURE-only caching, ERROR/UNDETERMINED retry semantics, and the "add new adversarial targets incrementally" use case. - Narrowing the fan-out: BenchmarkInitializer.set_params_from_args with target_names = [...]. - .pyrit_conf bootstrap: full YAML with initializer ordering. - Scorer flexibility: documents the widened Scorer | None annotation and the TrueFalseScorer-only runtime contract for stage 1. Per microsoft.github.io/PyRIT/contributing/notebooks/ the .ipynb is generated from the .py via jupytext; this commit regenerates the .ipynb to match the new .py source. Both committed pre-execution (no real output cells). Maintainers running pct_to_ipynb.py before the next release will re-execute against real endpoints; doing so now would require ADVERSARIAL_CHAT_* env vars set in this dev env. Downstream: the published doc page at https://microsoft.github.io/PyRIT/scanner/benchmark/ is built from doc/scanner/benchmark.py and will update automatically when this PR merges into main. Companion test (smoke-run the notebook with mocked targets) is deferred to its own follow-up commit (F6.2 / scanner-notebook-test per plan).

Upstream 23e2aa6 (DOC strict build) tightened ruff TC rules. RegistryEntry and TagQuery are only used in type annotations on TargetRegistry, so they belong in the if TYPE_CHECKING: block. Pre-commit clean across all PR-touched files after this fix. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

This PR refactors the adversarial benchmarking workflow to be registry-driven and more resumable, primarily by introducing a benchmark initializer that fans adversarial-capable techniques across ADVERSARIAL-tagged targets and by improving target tagging/discovery to support that flow end-to-end.

Changes:

Add BenchmarkInitializer to discover ADVERSARIAL-tagged targets and register per-model fanned AttackTechniqueSpecs into AttackTechniqueRegistry.
Fix TargetInitializer to propagate TargetConfig.tags into TargetRegistry and add a guard to skip model-configured targets when the per-target model env var is missing.
Extend TargetRegistry with get_by_tag_query (TagQuery-based key matching) and add/expand unit & e2e tests plus updated benchmark documentation and env examples.

Reviewed changes

Copilot reviewed 14 out of 33 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
`pyrit/setup/initializers/components/targets.py`	Registers new ADVERSARIAL chat variants; propagates `config.tags` into the registry; adds a skip-with-warning guard for missing per-target model env vars.
`pyrit/setup/initializers/benchmark.py`	New initializer to discover adversarial targets via tag query and fan out adversarial-capable technique specs across them.
`pyrit/setup/initializers/__init__.py`	Exports `BenchmarkInitializer` from the initializers package.
`pyrit/registry/object_registries/target_registry.py`	Adds `get_by_tag_query` to enable TagQuery-based discovery of tagged targets.
`tests/unit/setup/test_targets_initializer.py`	Adds coverage for tag propagation, idempotency, unique registry names, and new ADVERSARIAL_CHAT variants.
`tests/unit/setup/test_benchmark_initializer.py`	New unit tests for benchmark initializer discovery, fan-out naming/tags, idempotency, and narrowing behavior.
`tests/unit/registry/test_target_registry.py`	Adds characterization for duplicate-name overwrites; adds `get_by_tag_query` test coverage including composite queries and key-only semantics.
`tests/unit/scenario/benchmark/test_adversarial.py`	Rewritten tests for the collapsed `AdversarialBenchmark` shape, registry-driven strategies, display grouping, caching behavior, and scorer-typing guard.
`tests/end_to_end/test_scenarios.py`	Adds per-scenario initializer override map so `benchmark.adversarial` runs with `benchmark` initializer in e2e.
`doc/scanner/benchmark.py`	Updates benchmark documentation to the new env-driven registry + initializer workflow, including caching and fan-out narrowing.
`doc/scanner/benchmark.ipynb`	Synchronized notebook update matching the rewritten `benchmark.py` content.
`.env_example`	Adds ADVERSARIAL chat variant env var groups (SINGLETURN/MULTITURN/REASONING) for discovery and fan-out.
`tests/unit/scenario/core/test_strategy_validation.py`	Adds tests around composite strategy naming and ScenarioCompositeStrategy deprecation warnings.
`tests/unit/scenario/core/test_scenario_strategy_invariants.py`	Adds shared invariant tests for dynamically generated strategy enums.
`tests/unit/scenario/core/test_scenario_partial_results.py`	Adds tests for scenario retry/resume behavior when atomic attacks return partial results.
`tests/unit/scenario/core/test_dataset_configuration.py`	Adds comprehensive unit tests for DatasetConfiguration behaviors (data sources, sampling, error cases).
`tests/unit/scenario/core/test_baseline_deprecation.py`	Adds tests for deprecated baseline constructor shims and their runtime behavior.
`tests/unit/scenario/core/test_attack_technique.py`	Adds unit tests for AttackTechnique initialization and identifier behavior.
`tests/unit/scenario/core/test_attack_technique_factory.py`	Adds extensive unit tests for AttackTechniqueFactory validation, creation, identifier hashing, and scorer override policies.
`tests/unit/scenario/garak/test_encoding.py`	Adds/rewrites Encoding scenario tests and a baseline-uniformity regression test under `max_dataset_size`.
`tests/unit/scenario/airt/test_cyber.py`	Updates Cyber scenario tests for technique registry pattern and dynamic strategy behavior.
`tests/unit/scenario/airt/test_jailbreak.py`	Updates Jailbreak scenario tests around baseline behavior, many-shot patching, and strategy execution.
`tests/unit/scenario/airt/test_leakage.py`	Adds/updates Leakage scenario tests for dynamic strategies and baseline policy expectations.
`tests/unit/scenario/airt/test_scam.py`	Updates Scam scenario tests, including supported parameter plumbing and baseline-uniformity regression test.
`tests/unit/scenario/airt/test_psychosocial.py`	Adds/updates Psychosocial scenario tests, including capability requirement validation and baseline-uniformity regression test.

…get. Changed to AzureMLChatTarget.

rlundeen2 · 2026-05-22T21:26:38Z

+# ### CLI quickstart
+#
+# ```bash
+# pyrit_scan benchmark.adversarial \


I think to be consistent, adversarial should be passed in as a required flag/argument here. e.g. --adversarial-target <target1> <target2>

I think this is supported in existing CLI, but definitely in my CLI refactor PR

rlundeen2 · 2026-05-22T21:28:46Z

 await output_scenario_async(baseline_result)
+
+# %% [markdown]
+# ## Cross-run caching


I want this to mirror our docs for other scanners. In this section, I'd love only basic usage. I think I'd stop the docs after output, and then add this information to the help since it's an argument

rlundeen2 · 2026-05-22T21:31:02Z

        """
        return self.get(name)
+
+    def get_by_tag_query(self, *, query: TagQuery) -> list[RegistryEntry[PromptTarget]]:


you should be able to use get_by_tag from the base class

rlundeen2 · 2026-05-22T21:32:57Z

    "## Adversarial Benchmark\n",
-    "The adversarial benchmarking scenario (`AdversarialBenchmark`) compares the effectiveness of different adversarial models in successfully executing attacks against a target model."
+    "\n",
+    "`AdversarialBenchmark` holds the objective target and dataset constant and varies the adversarial\n",


This will be tough for new users to understand (e.g. it is several steps to update the configuration, etc to run a benchmark). I would just ask them to pass them in. They can list targets and then pass those in. This is whad Adrian's add_parameter work can be used for

rlundeen2 · 2026-05-22T21:39:47Z

+    on the generated enum (e.g.
+    ``BenchmarkStrategy("red_teaming__adversarial_chat_singleturn")``).
+
+    Returns:


This would be tough to understand as a user. If you do --help and see "this strategy has light for fanned variants" it's a bit confusing to understand what fanned out means. Also, the registry entires are still there, we're adding and removing things to the registry.

To simplify this, I might just include every registered strategy that needs an adversarial chat and doesn't have its adversarial chat set.

This way the users doin't need to understand anything about the algorithm, and devs don't need to remember to specially tag techniques. We can safely remove the concept of fanning or tagging things to determine what to run.

rlundeen2 · 2026-05-22T21:43:29Z

+``TagQuery.all("adversarial")``), then for every adversarial-capable
+technique in ``SCENARIO_TECHNIQUES`` builds one fanned
+``AttackTechniqueSpec`` per discovered target. Each fanned spec binds the
+live target onto ``adversarial_chat`` and is registered into


We probably don't need an initializer here I think

rlundeen2 · 2026-05-22T21:51:52Z

-        self._objective_scorer: TrueFalseScorer = (
-            objective_scorer if objective_scorer else self._get_default_objective_scorer()
-        )
+        resolved_scorer: Scorer = objective_scorer if objective_scorer else self._get_default_objective_scorer()


nit: this should be enough, no need to raise and validate; other classes rely on the base returning a scorer

rlundeen2 · 2026-05-22T21:58:22Z

-        if self._objective_target is None:
-            raise ValueError(
-                "Scenario not properly initialized. Call await scenario.initialize_async() before running."
+        candidates = await super()._get_atomic_attacks_async()


We need to (probably using the analytics module) query for exact results for this model including the objective target. So things that need to be the same (currently missing the objective target eval hash)

Tehcnique eval hash + objective target eval hash.

Things we should leave out of our query

scenario result id or scenario info

We should likely take the most recent result if there are multiple

rlundeen2 · 2026-05-22T22:00:03Z

+    #: VERSION=1 remain queryable but won't suppress fresh runs.
+    VERSION: int = 2
+
    _cached_strategy_class: ClassVar[type[ScenarioStrategy] | None] = None


I think _cached_strategy_class has some bugs and is also likely not needed

rlundeen2 · 2026-05-22T22:01:39Z

-            for spec in SCENARIO_TECHNIQUES
-            if AttackTechniqueRegistry._accepts_adversarial(spec.attack_class) and spec.adversarial_chat is None
-        ]
+        return technique_name.split("__", 1)[1] if "__" in technique_name else technique_name


can we just use the model name? Or even the target registry name?

Victor Valbuena and others added 14 commits May 20, 2026 13:19

new .env_example with adversrial target models.

0acd48a

.env_example formatting

6881d19

More .env_example formatting

1f1a07b

Reorganized unit test directory

81d529d

Renaming unit tests for consistency

6290bbe

FIX: TargetInitializer propagates config.tags to registry entries

42887b9

FEAT: Add TargetRegistry.get_by_tag_query for TagQuery-based lookup

b7af0de

Merge remote-tracking branch 'origin/main' into adversarial_benchmark…

9ec8234

…_refactor Updating branch off fork to include latest commits.

Merge branch 'main' into adversarial_benchmark_refactor

8b5bcc9

ValbuenaVC changed the title ~~[DRAFT] FEAT: Adversarial Benchmark Scenario Refactor~~ FEAT: Adversarial Benchmark Scenario Refactor May 21, 2026

ValbuenaVC marked this pull request as ready for review May 21, 2026 20:11

Victor Valbuena added 2 commits May 21, 2026 13:16

ValbuenaVC requested review from Copilot, rlundeen2 and romanlutz and removed request for Copilot May 21, 2026 20:23

Copilot started reviewing on behalf of ValbuenaVC May 21, 2026 20:24 View session

Copilot AI reviewed May 21, 2026

View reviewed changes

Comment thread .env_example

Victor Valbuena and others added 3 commits May 21, 2026 13:51

FIX: TargetInitializer initialized adversarial chats as OpenAIChatTar…

34b86f0

…get. Changed to AzureMLChatTarget.

Merge branch 'main' into adversarial_benchmark_refactor

2f2c209

Merge branch 'main' into adversarial_benchmark_refactor

a3e1ffa

rlundeen2 reviewed May 22, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FEAT: Adversarial Benchmark Scenario Refactor#1765

FEAT: Adversarial Benchmark Scenario Refactor#1765
ValbuenaVC wants to merge 20 commits into
microsoft:mainfrom
ValbuenaVC:adversarial_benchmark_refactor

ValbuenaVC commented May 20, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

rlundeen2 May 22, 2026 •

edited

Loading

Uh oh!

rlundeen2 May 22, 2026

Uh oh!

rlundeen2 May 22, 2026

Uh oh!

rlundeen2 May 22, 2026 •

edited

Loading

Uh oh!

rlundeen2 May 22, 2026 •

edited

Loading

Uh oh!

rlundeen2 May 22, 2026

Uh oh!

rlundeen2 May 22, 2026 •

edited

Loading

Uh oh!

rlundeen2 May 22, 2026 •

edited

Loading

Uh oh!

rlundeen2 May 22, 2026

Uh oh!

rlundeen2 May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ValbuenaVC commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Tests

Failure modes flagged for follow-up

Follow-up items (separate PRs)

Downstream notes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

rlundeen2 May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rlundeen2 May 22, 2026

Choose a reason for hiding this comment

Uh oh!

rlundeen2 May 22, 2026

Choose a reason for hiding this comment

Uh oh!

rlundeen2 May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rlundeen2 May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rlundeen2 May 22, 2026

Choose a reason for hiding this comment

Uh oh!

rlundeen2 May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rlundeen2 May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rlundeen2 May 22, 2026

Choose a reason for hiding this comment

Uh oh!

rlundeen2 May 22, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ValbuenaVC commented May 20, 2026 •

edited

Loading

rlundeen2 May 22, 2026 •

edited

Loading

rlundeen2 May 22, 2026 •

edited

Loading

rlundeen2 May 22, 2026 •

edited

Loading

rlundeen2 May 22, 2026 •

edited

Loading

rlundeen2 May 22, 2026 •

edited

Loading