Data processing utilities for the AgentHarm benchmark dataset.
AgentHarm evaluates agentic AI capabilities and safety boundaries through adversarial testing. The benchmark measures how LLM agents handle requests across harmful, benign, and conversational scenarios.
Based on AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents by Andriushchenko et al., 2024.
Original dataset: https://huggingface.co/datasets/ai-safety-institute/AgentHarm
Scripts extract and normalize benchmark behaviors from JSON format into structured CSV for analysis. Separates harmful, benign, and chat test cases while preserving grading metadata.
python convert_to_csv.py
# Outputs: agentharm_dataset.csv
python make_csv.py
# Outputs: agentharm_full.csvconvert_to_csv.py- JSON to CSV conversion with error handlingmake_csv.py- Streamlined CSV generationagentharm_dataset.csv- Processed benchmark behaviorsagentharm_full.csv- Complete dataset export
Python 3.8+ (standard library only)
Original benchmark: MIT License with additional safety research clause (see LICENSE)