45 lines (29 loc) · 1.35 KB

DeepSource Benchmarks

Benchmark dataset evaluating code review and security analysis tools on the OpenSSF CVE Benchmark.

Benchmarked Tools

Last updated: April 12, 2026

Data Format

Judged Results (`benchmarks/judged-results/`)

Final evaluation results in JSONL format with fields:

cve_id: CVE identifier
variant: fixed or unfixed
detected_issues: Issues found by the tool
TP, FP, TN, FN: Classification metrics
judge_reasoning: Explanation of the judgment

Processed Results (`benchmarks/processed/`)

Intermediate formatted results from each tool, normalized for comparison.

Raw Output (`benchmarks/raw-output/`)

Original tool outputs per CVE, preserving the exact response from each tool.

Archive

The archive/ directory contains prompts and data from earlier benchmark runs:

References

OpenSSF CVE Benchmark