Code Detector using Deep Learning

Follow below steps for implementing binary classification for detecting dead or rarely-used code in SQLite callgraph.

Project Overview

Call Graph Parsing: Extracts function relationships from LLVM-generated call graphs
Multi-Modal Features: Combines structural (graph-based) and semantic (name-based) features
Multiple Architectures: Supports MLP, Transformer, and Graph Neural Network (GNN) models
Comprehensive Evaluation: Includes accuracy, precision, recall, F1-score, and ROC-AUC metrics

Dataset: SQLite callgraph with 2,621 functions

Distribution:
- ~1,648 rarely used functions (uses ≤ 2)
- ~973 frequently used functions (uses > 2)

Basic Installation set-up

# Install dependencies
pip install -r requirements.txt --break-system-packages

# Or install individually
pip install torch numpy pandas scikit-learn networkx matplotlib seaborn --break-system-packages

Requirements

Install dependencies:

pip install torch numpy pandas scikit-learn networkx matplotlib seaborn

For GNN support (optional):

pip install torch-geometric

Usage

Quick Start

Run the interactive demo to explore the data and train a quick model:

python quick_start.py

Train MLP Model (Baseline)

python -u dead_code_detector.py 2>&1 | tee results_mlp.txt

Train Transformer Model

python -u run_for_transformer.py --callgraph callgraph.txt --threshold 2 --seq_mode groups 2>&1 | tee results_for_transformer.txt

Train GNN Model

python -u run_for_gnn.py --callgraph callgraph.txt --threshold 2 2>&1 | tee results_for_gnn.txt

Step-by-Step Implementation

Step 1: Data Parsing and Feature Extraction

The callgraph.txt format:

Call graph node for function: 'functionName'<<address>>  #uses=X
  CS<None> calls function 'calledFunction1'
  CS<None> calls function 'calledFunction2'
  ...

Extracted Features:

Feature	Description	Type
`uses`	Number of times function is called	Numeric
`num_calls`	Number of functions this calls	Numeric
`in_degree`	How many functions call this (graph)	Numeric
`out_degree`	How many functions this calls (graph)	Numeric
`pagerank`	PageRank centrality score	Numeric
`func_length`	Length of function name	Numeric
`has_number`	Contains digits (0/1)	Binary
`has_underscore`	Contains underscore (0/1)	Binary
`is_internal`	Internal function (0/1)	Binary
`has_sqlite_prefix`	Starts with 'sqlite3' (0/1)	Binary

Step 2: Label Assignment

Threshold-based labeling:

threshold = 2  # Configurable
label = 1 if uses <= threshold else 0

Justification:

Functions with 0-2 uses → Potentially dead/rarely used
Functions with 3+ uses → Active/frequently used
You can adjust the threshold based on your needs

Step 3: Data Preprocessing

from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split

# Normalize features (important for neural networks!)
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Split: 60% train, 20% val, 20% test
X_train, X_temp, y_train, y_temp = train_test_split(X, y, test_size=0.4)
X_val, X_test, y_val, y_test = train_test_split(X_temp, y_temp, test_size=0.5)

Step 4: Model Architecture

Default Model (MLP):

Input (10 features)
    ↓
Linear(10 → 64) + BatchNorm + ReLU + Dropout(0.3)
    ↓
Linear(64 → 32) + BatchNorm + ReLU + Dropout(0.3)
    ↓
Linear(32 → 16) + BatchNorm + ReLU + Dropout(0.3)
    ↓
Linear(16 → 2)  [Output logits]

Total Parameters: ~5,000 (lightweight!)

Step 5: Training

Loss Function: CrossEntropyLoss (standard for classification)

Optimizer: Adam with:

Learning rate: 0.001
Weight decay: 1e-5 (L2 regularization)

Learning Rate Scheduler: ReduceLROnPlateau

Reduces LR when validation loss plateaus
Factor: 0.5
Patience: 10 epochs

Early Stopping:

Patience: 20 epochs
Monitors validation accuracy
Saves best model

Step 6: Evaluation Metrics

Primary Metrics:

Accuracy: Overall correctness
Precision: Of predicted rarely-used, how many are correct?
Recall: Of actual rarely-used, how many did we find?
F1-Score: Harmonic mean of precision and recall
ROC-AUC: Area under ROC curve (discrimination ability)

Confusion Matrix:

                 Predicted
                 0    1
Actual    0    [TN] [FP]
          1    [FN] [TP]

Model Architectures

1. Multi-Layer Perceptron (MLP) - Default

Best for: Tabular features
Pros: Fast, simple, interpretable
Cons: Doesn't use graph structure

from dead_code_detector import DeadCodeClassifier

model = DeadCodeClassifier(input_dim=10, hidden_dims=[64, 32, 16], dropout=0.3)

2. Transformer

Best for: Complex feature interactions
Pros: Self-attention mechanism
Cons: More parameters, slower training

from alternative_models import TransformerClassifier

model = TransformerClassifier(input_dim=10, d_model=64, nhead=4)

3. Graph Neural Network (GNN)

Best for: Leveraging callgraph structure
Pros: Uses actual graph topology
Cons: Requires torch-geometric, more complex setup

from alternative_models import GNNClassifier

model = GNNClassifier(input_dim=10, hidden_dim=64, num_layers=3)

Recommendation: Start with MLP (default), then try GNN if you want to leverage graph structure.

Results

Expected Performance

Baseline (Random Classifier): ~50% accuracy
Target Performance: 75-85% accuracy
Excellent Performance: >90% accuracy

What Good Results Look Like

Classification Report:
                    precision  recall  f1-score  support
Frequently Used        0.85     0.88     0.86      195
Rarely Used            0.83     0.79     0.81      330

accuracy                                 0.84      525
macro avg              0.84     0.84     0.84      525
weighted avg           0.84     0.84     0.84      525

ROC-AUC Score: 0.91

Interpreting Metrics

High Precision for "Rarely Used":

When model says code is rarely used, it's usually correct
Good for automated code cleanup

High Recall for "Rarely Used":

Model finds most of the rarely-used code
Minimizes false negatives

High F1-Score:

Balanced performance
No extreme precision-recall tradeoff

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Code Detector using Deep Learning

Table of Contents

Project Overview

Basic Installation set-up

Requirements

Usage

Quick Start

Train MLP Model (Baseline)

Train Transformer Model

Train GNN Model

Step-by-Step Implementation

Step 1: Data Parsing and Feature Extraction

Step 2: Label Assignment

Step 3: Data Preprocessing

Step 4: Model Architecture

Step 5: Training

Step 6: Evaluation Metrics

Model Architectures

1. Multi-Layer Perceptron (MLP) - Default

2. Transformer

3. Graph Neural Network (GNN)

Results

Expected Performance

What Good Results Look Like

Interpreting Metrics

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
graphs		graphs
results		results
.gitignore		.gitignore
README.md		README.md
alternative_models.py		alternative_models.py
callgraph.txt		callgraph.txt
dead_code_detector.py		dead_code_detector.py
quick_start.py		quick_start.py
requirements.txt		requirements.txt
run_for_gnn.py		run_for_gnn.py
run_for_transformer.py		run_for_transformer.py

Folders and files

Latest commit

History

Repository files navigation

Code Detector using Deep Learning

Table of Contents

Project Overview

Basic Installation set-up

Requirements

Usage

Quick Start

Train MLP Model (Baseline)

Train Transformer Model

Train GNN Model

Step-by-Step Implementation

Step 1: Data Parsing and Feature Extraction

Step 2: Label Assignment

Step 3: Data Preprocessing

Step 4: Model Architecture

Step 5: Training

Step 6: Evaluation Metrics

Model Architectures

1. Multi-Layer Perceptron (MLP) - Default

2. Transformer

3. Graph Neural Network (GNN)

Results

Expected Performance

What Good Results Look Like

Interpreting Metrics

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages