Skip to content

Latest commit

 

History

History
45 lines (29 loc) · 1.35 KB

File metadata and controls

45 lines (29 loc) · 1.35 KB

DeepSource Benchmarks

Benchmark dataset evaluating code review and security analysis tools on the OpenSSF CVE Benchmark.

Benchmarked Tools

Last updated: April 12, 2026

Data Format

Judged Results (benchmarks/judged-results/)

Final evaluation results in JSONL format with fields:

  • cve_id: CVE identifier
  • variant: fixed or unfixed
  • detected_issues: Issues found by the tool
  • TP, FP, TN, FN: Classification metrics
  • judge_reasoning: Explanation of the judgment

Processed Results (benchmarks/processed/)

Intermediate formatted results from each tool, normalized for comparison.

Raw Output (benchmarks/raw-output/)

Original tool outputs per CVE, preserving the exact response from each tool.

Archive

The archive/ directory contains prompts and data from earlier benchmark runs:

References