Skip to content

roopk27/DeepLearning

Repository files navigation

Code Detector using Deep Learning

Follow below steps for implementing binary classification for detecting dead or rarely-used code in SQLite callgraph.

Table of Contents

  1. Project Overview
  2. Quick Start
  3. Step-by-Step Implementation
  4. Understanding the Features
  5. Model Architectures
  6. Evaluation Metrics

This project uses neural networks to classify functions as either frequently used or dead/rarely used based on call graph topology and function characteristics. The system is demonstrated on the SQLite database codebase.

Project Overview

  • Call Graph Parsing: Extracts function relationships from LLVM-generated call graphs
  • Multi-Modal Features: Combines structural (graph-based) and semantic (name-based) features
  • Multiple Architectures: Supports MLP, Transformer, and Graph Neural Network (GNN) models
  • Comprehensive Evaluation: Includes accuracy, precision, recall, F1-score, and ROC-AUC metrics

Dataset: SQLite callgraph with 2,621 functions

  • Distribution:
    • ~1,648 rarely used functions (uses ≤ 2)
    • ~973 frequently used functions (uses > 2)

Basic Installation set-up

# Install dependencies
pip install -r requirements.txt --break-system-packages

# Or install individually
pip install torch numpy pandas scikit-learn networkx matplotlib seaborn --break-system-packages

Requirements

Install dependencies:

pip install torch numpy pandas scikit-learn networkx matplotlib seaborn

For GNN support (optional):

pip install torch-geometric

Usage

Quick Start

Run the interactive demo to explore the data and train a quick model:

python quick_start.py

Train MLP Model (Baseline)

python -u dead_code_detector.py 2>&1 | tee results_mlp.txt  

Train Transformer Model

python -u run_for_transformer.py --callgraph callgraph.txt --threshold 2 --seq_mode groups 2>&1 | tee results_for_transformer.txt

Train GNN Model

python -u run_for_gnn.py --callgraph callgraph.txt --threshold 2 2>&1 | tee results_for_gnn.txt

Step-by-Step Implementation

Step 1: Data Parsing and Feature Extraction

The callgraph.txt format:

Call graph node for function: 'functionName'<<address>>  #uses=X
  CS<None> calls function 'calledFunction1'
  CS<None> calls function 'calledFunction2'
  ...

Extracted Features:

Feature Description Type
uses Number of times function is called Numeric
num_calls Number of functions this calls Numeric
in_degree How many functions call this (graph) Numeric
out_degree How many functions this calls (graph) Numeric
pagerank PageRank centrality score Numeric
func_length Length of function name Numeric
has_number Contains digits (0/1) Binary
has_underscore Contains underscore (0/1) Binary
is_internal Internal function (0/1) Binary
has_sqlite_prefix Starts with 'sqlite3' (0/1) Binary

Step 2: Label Assignment

Threshold-based labeling:

threshold = 2  # Configurable
label = 1 if uses <= threshold else 0

Justification:

  • Functions with 0-2 uses → Potentially dead/rarely used
  • Functions with 3+ uses → Active/frequently used
  • You can adjust the threshold based on your needs

Step 3: Data Preprocessing

from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split

# Normalize features (important for neural networks!)
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Split: 60% train, 20% val, 20% test
X_train, X_temp, y_train, y_temp = train_test_split(X, y, test_size=0.4)
X_val, X_test, y_val, y_test = train_test_split(X_temp, y_temp, test_size=0.5)

Step 4: Model Architecture

Default Model (MLP):

Input (10 features)
    ↓
Linear(10 → 64) + BatchNorm + ReLU + Dropout(0.3)
    ↓
Linear(64 → 32) + BatchNorm + ReLU + Dropout(0.3)
    ↓
Linear(32 → 16) + BatchNorm + ReLU + Dropout(0.3)
    ↓
Linear(16 → 2)  [Output logits]

Total Parameters: ~5,000 (lightweight!)

Step 5: Training

Loss Function: CrossEntropyLoss (standard for classification)

Optimizer: Adam with:

  • Learning rate: 0.001
  • Weight decay: 1e-5 (L2 regularization)

Learning Rate Scheduler: ReduceLROnPlateau

  • Reduces LR when validation loss plateaus
  • Factor: 0.5
  • Patience: 10 epochs

Early Stopping:

  • Patience: 20 epochs
  • Monitors validation accuracy
  • Saves best model

Step 6: Evaluation Metrics

Primary Metrics:

  • Accuracy: Overall correctness
  • Precision: Of predicted rarely-used, how many are correct?
  • Recall: Of actual rarely-used, how many did we find?
  • F1-Score: Harmonic mean of precision and recall
  • ROC-AUC: Area under ROC curve (discrimination ability)

Confusion Matrix:

                 Predicted
                 0    1
Actual    0    [TN] [FP]
          1    [FN] [TP]

Model Architectures

1. Multi-Layer Perceptron (MLP) - Default

Best for: Tabular features
Pros: Fast, simple, interpretable
Cons: Doesn't use graph structure

from dead_code_detector import DeadCodeClassifier

model = DeadCodeClassifier(input_dim=10, hidden_dims=[64, 32, 16], dropout=0.3)

2. Transformer

Best for: Complex feature interactions
Pros: Self-attention mechanism
Cons: More parameters, slower training

from alternative_models import TransformerClassifier

model = TransformerClassifier(input_dim=10, d_model=64, nhead=4)

3. Graph Neural Network (GNN)

Best for: Leveraging callgraph structure
Pros: Uses actual graph topology
Cons: Requires torch-geometric, more complex setup

from alternative_models import GNNClassifier

model = GNNClassifier(input_dim=10, hidden_dim=64, num_layers=3)

Recommendation: Start with MLP (default), then try GNN if you want to leverage graph structure.


Results

Expected Performance

Baseline (Random Classifier): ~50% accuracy
Target Performance: 75-85% accuracy
Excellent Performance: >90% accuracy

What Good Results Look Like

Classification Report:
                    precision  recall  f1-score  support
Frequently Used        0.85     0.88     0.86      195
Rarely Used            0.83     0.79     0.81      330

accuracy                                 0.84      525
macro avg              0.84     0.84     0.84      525
weighted avg           0.84     0.84     0.84      525

ROC-AUC Score: 0.91

Interpreting Metrics

High Precision for "Rarely Used":

  • When model says code is rarely used, it's usually correct
  • Good for automated code cleanup

High Recall for "Rarely Used":

  • Model finds most of the rarely-used code
  • Minimizes false negatives

High F1-Score:

  • Balanced performance
  • No extreme precision-recall tradeoff

About

This repository is created for small project ideas where deep learning integration can be explored

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages