FEAT: Add JailBreakV-28K dataset loader by diamond8658 · Pull Request #1548 · microsoft/PyRIT

diamond8658 · 2026-03-27T20:50:38Z

FEAT: Add JailBreakV-28K remote dataset loader

Description

This PR introduces a new remote dataset loader for the JailBreakV-28K (V0.2) benchmark. It enables the seamless ingestion of over 28,000 jailbreak prompts directly from the source CSV hosted on the SaFo-Lab repository. This resolves issue #1007

Key Changes:

Asynchronous Ingestion: Implements _JailBreakV28KDataset inheriting from _RemoteDatasetLoader for high-performance data fetching.
CSV Version 0.2 Support: Specifically targets the versioned CSV release to ensure reproducibility and stability.
Policy Mapping: Maps raw SaFo-Lab codes (P1-P5) to human-readable harm categories (e.g., Somatic Safety, Public Interest) to maintain consistency with existing PyRIT datasets.
Defensive Filtering: Includes a safety check to skip prompts containing Jinja2 syntax ({{, {%) to prevent unintended orchestrator execution.
Metadata Enrichment: Preserves raw policy codes and categories within the SeedPrompt metadata for granular downstream analysis.

Tests and Documentation

Unit Testing:

Added comprehensive unit tests in tests/unit/datasets/test_jailbreakv_28k_dataset.py.
Verified 100% test coverage for the new loader class.
Handled edge cases including:
- Malformed/missing CSV columns.
- Jinja2 safety filtering logic.
- Unknown policy code fallbacks (defaulting to "Unknown Policy").
- Empty remote responses (raising ValueError).

Linting:

Verified that the code passes all flake8 and ruff checks.
Follows the Microsoft pyrit coding standards and naming conventions.

JupyText & Documentation:

This PR implements a backend data loader and does not modify existing .ipynb or .md documentation files.
No JupyText synchronization was required as no new notebook-based tutorials were introduced in this PR.
Verified the loader's output format is compatible with the SeedDataset model used throughout the library's existing documentation and orchestrators.

Implements V0.2 CSV ingestion with policy mapping and Jinja2 safety filtering.

feat(datasets): add JailBreakV-28K remote CSV loader

29f6e19

Implements V0.2 CSV ingestion with policy mapping and Jinja2 safety filtering.

diamond8658 marked this pull request as ready for review March 27, 2026 20:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FEAT: Add JailBreakV-28K dataset loader#1548

FEAT: Add JailBreakV-28K dataset loader#1548
diamond8658 wants to merge 1 commit intomicrosoft:mainfrom
diamond8658:feature/jailbreakv-28k-loader

diamond8658 commented Mar 27, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

diamond8658 commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

FEAT: Add JailBreakV-28K remote dataset loader

Description

Key Changes:

Tests and Documentation

Unit Testing:

Linting:

JupyText & Documentation:

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

diamond8658 commented Mar 27, 2026 •

edited

Loading