LLM Safety Benchmark Processing

Data processing utilities for the AgentHarm benchmark dataset.

About AgentHarm

AgentHarm evaluates agentic AI capabilities and safety boundaries through adversarial testing. The benchmark measures how LLM agents handle requests across harmful, benign, and conversational scenarios.

Source

Based on AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents by Andriushchenko et al., 2024.

Original dataset: https://huggingface.co/datasets/ai-safety-institute/AgentHarm

Processing Approach

Scripts extract and normalize benchmark behaviors from JSON format into structured CSV for analysis. Separates harmful, benign, and chat test cases while preserving grading metadata.

Usage

python convert_to_csv.py
# Outputs: agentharm_dataset.csv

python make_csv.py
# Outputs: agentharm_full.csv

Requirements

Python 3.8+ (standard library only)

License

Original benchmark: MIT License with additional safety research clause (see LICENSE)

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
agentharm_dataset.csv		agentharm_dataset.csv
agentharm_full.csv		agentharm_full.csv
convert_to_csv.py		convert_to_csv.py
make_csv.py		make_csv.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM Safety Benchmark Processing

About AgentHarm

Source

Processing Approach

Usage

Contents

Requirements

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

LLM Safety Benchmark Processing

About AgentHarm

Source

Processing Approach

Usage

Contents

Requirements

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages