Skip to content

dApY3112/CodeLLMExp-RepairAware-Security

Repository files navigation

Secure Programming CodeLLMExp

This repository contains a reproducible analysis pipeline for CodeLLMExp, a multi-language dataset for studying automated vulnerability localization and explanation in AI-generated code.

The project focuses on three connected secure-programming tasks:

  • CWE classification: predict the vulnerability category from source code.
  • Vulnerable line localization: identify lines that are likely to contain the vulnerability.
  • Faithfulness-to-fix evaluation: compare predicted vulnerable lines with lines changed by the secure fix.

The pipeline intentionally uses lightweight and interpretable methods, including TF-IDF with classical machine learning models, rule-based fix-concept extraction, diff analysis, and line-level scoring heuristics.

Repository Structure

.
+-- CodeLLMExp.jsonl                         # Full dataset in JSON Lines format
+-- README.md
+-- requirements.txt
+-- PROJECT_SUMMARY.md                       # Detailed experiment summary
+-- IMPLEMENTATION_PLAN (1).md               # Implementation notes
+-- Explainable_Secure_Programming_CodeLLMExp.pdf
+-- Secure_programming (7).pdf
+-- cwe_seeds/                               # Canonical vulnerable seed examples
+-- seed/                                    # Generated/augmented seed examples
+-- source/                                  # Source snippets organized by language and CWE
+-- src/                                     # Reusable Python utilities
+-- notebooks/                               # End-to-end experiment notebooks

Important generated outputs such as processed CSV files, trained models, figures, metrics, and storyboard images are ignored by Git and can be regenerated by running the notebooks.

Dataset

The dataset contains vulnerable code snippets, fixed code, CWE labels, vulnerable-line annotations, and natural-language security explanations.

Summary after cleaning:

Metric Value
Total samples 10,403
Languages Python, Java, C
Unique CWE labels 29
Rows with vulnerable-line annotations 97.76%

Language distribution:

Language Samples
Python 4,610
Java 3,088
C 2,705

Setup

Create and activate a Python environment:

python -m venv .venv
.\.venv\Scripts\Activate.ps1

Install dependencies:

pip install -r requirements.txt

Register the environment as a Jupyter kernel:

python -m ipykernel install --user --name codellmexp --display-name "Python (CodeLLMExp)"

Running the Pipeline

Run the notebooks in order:

  1. notebooks/01_data_loading_cleaning.ipynb
  2. notebooks/02_secure_fix_concept_extraction.ipynb
  3. notebooks/03_cwe_classification_baseline.ipynb
  4. notebooks/04_vulnerable_line_localization.ipynb
  5. notebooks/05_faithfulness_to_fix_evaluation.ipynb
  6. notebooks/06_storyboard_and_report_figures.ipynb

The notebooks write intermediate and final artifacts under data/processed/, data/splits/, and outputs/.

Main Results

The current experiment summary reports:

Task Best/Key Result
CWE classification, Linear SVM 99.87% accuracy
CWE classification, Linear SVM 99.87% weighted F1
Vulnerable line localization 21.66% Top-1 accuracy
Vulnerable line localization 41.69% Top-5 accuracy

See PROJECT_SUMMARY.md for the full analysis, including fix-concept distributions, localization results by language/CWE, faithfulness-to-fix metrics, and case studies.

Notes on Version Control

The repository tracks source code, notebooks, seed/source examples, reports, and the main CodeLLMExp.jsonl dataset file.

Generated experiment artifacts are excluded through .gitignore, including:

  • outputs/
  • data/raw/
  • data/processed/
  • data/splits/
  • Python caches and notebook checkpoints
  • trained model files such as *.pkl

License

The dataset is released under the Creative Commons Attribution 4.0 International License.

Citation

If you use this dataset or analysis pipeline, please cite the accompanying report/paper and acknowledge the CodeLLMExp dataset.

About

Interpretable secure programming analysis of AI-generated vulnerable code using CodeLLMExp, CWE classification, line localization, and repair-aware evaluation.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors