Deepfake Detection System using ML & DL

Overview

This project implements a deepfake detection system using both traditional machine learning and deep learning approaches. Facial images are processed and classified as real or fake using handcrafted visual features (for ML models) and deep neural representations. Alongside developing and evaluating classical ML models, the system incorporates insights from the Meso4 CNN architecture, a state-of-the-art deepfake detection framework. Performance comparisons are drawn between the implemented ML models and theoretical benchmarks from literature .

Dataset & Preprocessing

The dataset used in this project is a large, real-world deepfake dataset collected from various internet sources. It contains compressed .tar.gz archives hosting thousands of facial images extracted from videos. The dataset is publicly available on Hugging Face:
🔗 WildDeepfake on Hugging Face

A custom preprocessing pipeline was implemented to:

Extract .png images from Hugging Face’s .tar.gz archives
Organize and clean the images into labeled directories and
Compute six image-based features (entropy, blur, noise, keypoints, blobs, and phase unwrapping).

The resulting structured dataset was used to train and evaluate machine learning models.

Methodology

Traditional Machine Learning Models (Scikit-learn)

Random Forest
Support Vector Machine (SVM)
Logistic Regression
K-Nearest Neighbors (KNN)
XGBoost

These models were trained using the extracted features and evaluated using standard performance metrics.

Deep Learning Model

Meso4 CNN (TensorFlow/Keras)
A compact convolutional neural network that processes raw images and learns spatial representations without manual feature engineering.

Results

Model	Accuracy	Precision	Recall	F1 Score
Meso4 (CNN)	0.9400	0.9380	0.9420	0.9400
Random Forest	0.9162	0.9155	0.9250	0.9202
KNN	0.8884	0.8935	0.8929	0.8932
Logistic Regression	0.8884	0.8935	0.8929	0.8932
XGBoost	0.8065	0.7817	0.8737	0.8251
SVM	0.7246	0.7060	0.8106	0.7547

Among the classical ML models, Random Forest stood out as the best-performing one, coming second only to Meso-4, the deep learning-based model. This demonstrates that classical machine learning—when carefully feature-engineered and fine-tuned—can serve as a viable alternative to deep learning-based models, especially under computational constraints.

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
Data_collection.ipynb		Data_collection.ipynb
Deeplearning_Models.ipynb		Deeplearning_Models.ipynb
Feature_Extraction.ipynb		Feature_Extraction.ipynb
ML_Models.ipynb		ML_Models.ipynb
Processed_Dataset.csv		Processed_Dataset.csv
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Deepfake Detection System using ML & DL

Overview

Dataset & Preprocessing

Methodology

Traditional Machine Learning Models (Scikit-learn)

Deep Learning Model

Results

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Deepfake Detection System using ML & DL

Overview

Dataset & Preprocessing

Methodology

Traditional Machine Learning Models (Scikit-learn)

Deep Learning Model

Results

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages