This repository contains a reproducible analysis pipeline for CodeLLMExp, a multi-language dataset for studying automated vulnerability localization and explanation in AI-generated code.
The project focuses on three connected secure-programming tasks:
- CWE classification: predict the vulnerability category from source code.
- Vulnerable line localization: identify lines that are likely to contain the vulnerability.
- Faithfulness-to-fix evaluation: compare predicted vulnerable lines with lines changed by the secure fix.
The pipeline intentionally uses lightweight and interpretable methods, including TF-IDF with classical machine learning models, rule-based fix-concept extraction, diff analysis, and line-level scoring heuristics.
.
+-- CodeLLMExp.jsonl # Full dataset in JSON Lines format
+-- README.md
+-- requirements.txt
+-- PROJECT_SUMMARY.md # Detailed experiment summary
+-- IMPLEMENTATION_PLAN (1).md # Implementation notes
+-- Explainable_Secure_Programming_CodeLLMExp.pdf
+-- Secure_programming (7).pdf
+-- cwe_seeds/ # Canonical vulnerable seed examples
+-- seed/ # Generated/augmented seed examples
+-- source/ # Source snippets organized by language and CWE
+-- src/ # Reusable Python utilities
+-- notebooks/ # End-to-end experiment notebooks
Important generated outputs such as processed CSV files, trained models, figures, metrics, and storyboard images are ignored by Git and can be regenerated by running the notebooks.
The dataset contains vulnerable code snippets, fixed code, CWE labels, vulnerable-line annotations, and natural-language security explanations.
Summary after cleaning:
| Metric | Value |
|---|---|
| Total samples | 10,403 |
| Languages | Python, Java, C |
| Unique CWE labels | 29 |
| Rows with vulnerable-line annotations | 97.76% |
Language distribution:
| Language | Samples |
|---|---|
| Python | 4,610 |
| Java | 3,088 |
| C | 2,705 |
Create and activate a Python environment:
python -m venv .venv
.\.venv\Scripts\Activate.ps1Install dependencies:
pip install -r requirements.txtRegister the environment as a Jupyter kernel:
python -m ipykernel install --user --name codellmexp --display-name "Python (CodeLLMExp)"Run the notebooks in order:
notebooks/01_data_loading_cleaning.ipynbnotebooks/02_secure_fix_concept_extraction.ipynbnotebooks/03_cwe_classification_baseline.ipynbnotebooks/04_vulnerable_line_localization.ipynbnotebooks/05_faithfulness_to_fix_evaluation.ipynbnotebooks/06_storyboard_and_report_figures.ipynb
The notebooks write intermediate and final artifacts under data/processed/, data/splits/, and outputs/.
The current experiment summary reports:
| Task | Best/Key Result |
|---|---|
| CWE classification, Linear SVM | 99.87% accuracy |
| CWE classification, Linear SVM | 99.87% weighted F1 |
| Vulnerable line localization | 21.66% Top-1 accuracy |
| Vulnerable line localization | 41.69% Top-5 accuracy |
See PROJECT_SUMMARY.md for the full analysis, including fix-concept distributions, localization results by language/CWE, faithfulness-to-fix metrics, and case studies.
The repository tracks source code, notebooks, seed/source examples, reports, and the main CodeLLMExp.jsonl dataset file.
Generated experiment artifacts are excluded through .gitignore, including:
outputs/data/raw/data/processed/data/splits/- Python caches and notebook checkpoints
- trained model files such as
*.pkl
The dataset is released under the Creative Commons Attribution 4.0 International License.
If you use this dataset or analysis pipeline, please cite the accompanying report/paper and acknowledge the CodeLLMExp dataset.