Skip to content

TRahulsingh/DeepfakeDetector

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

13 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🧠 Deepfake Detection System

A state-of-the-art deepfake detection system built with PyTorch and EfficientNet-B0, featuring a user-friendly web interface for real-time image and video analysis.

βš™οΈ Created By


🌟 Features

  • Deep Learning Model: EfficientNet-B0 architecture fine-tuned for deepfake detection
  • Multi-format Support: Analyze both images (.jpg, .jpeg, .png) and videos (.mp4, .mov)
  • Web Interface: Interactive Gradio-based web application for easy testing
  • Real-time Analysis: Process first frame of videos for quick deepfake detection
  • Training Pipeline: Complete PyTorch Lightning training infrastructure
  • Model Export: Support for PyTorch (.pt) and ONNX format exports

πŸ“ System Architecture

For detailed system architecture diagrams, data flow, and component interactions, see ARCHITECTURE.md.

πŸš€ Quick Start

Prerequisites

  • Python 3.8 or higher
  • CUDA-compatible GPU (optional, but recommended for training)

Installation

  1. Clone the repository:

    git clone https://github.com/TRahulsingh/DeepfakeDetector.git
    cd DeepfakeDetector
  2. Install dependencies:

    pip install -r requirements.txt
  3. Download a pre-trained model (or train your own):

    • Place your model file as models/best_model-v3.pt

Usage

πŸ–₯️ Web Application

Launch the interactive web interface:

python web-app.py

The web app will open in your browser where you can:

  • Drag and drop images or videos
  • View real-time predictions with confidence scores
  • See preview of analyzed content

πŸ” Command Line Classification

Classify individual images:

python classify.py path/to/your/image.jpg

πŸŽ₯ Video Analysis

Process videos from a folder:

# Place videos in 'videos_to_predict' folder, then run:
python inference/video_inference.py

πŸ“‚ Supported Datasets

This deepfake detection system supports various popular deepfake datasets. Below are the recommended datasets for training and evaluation:

🎬 Video-based Datasets

FaceForensics++

  • Description: One of the most comprehensive deepfake datasets with 4 manipulation methods
  • Size: ~1,000 original videos, ~4,000 manipulated videos
  • Manipulations: Deepfakes, Face2Face, FaceSwap, NeuralTextures
  • Quality: Raw, c23 (light compression), c40 (heavy compression)
  • Download: GitHub Repository
  • Usage: Excellent for training robust models across different manipulation types

Celeb-DF (v2)

  • Description: High-quality celebrity deepfake dataset
  • Size: 590 real videos, 5,639 deepfake videos
  • Quality: High-resolution with improved visual quality
  • Download: Official Website
  • Usage: Great for testing model performance on high-quality deepfakes

DFDC (Deepfake Detection Challenge)

  • Description: Facebook's large-scale deepfake detection dataset
  • Size: ~100,000 videos (real and fake)
  • Diversity: Multiple actors, ethnicities, and ages
  • Download: Kaggle Competition
  • Usage: Large-scale training and benchmarking

DFD (Google's Deepfake Detection Dataset)

  • Description: Google/Jigsaw deepfake dataset
  • Size: ~3,000 deepfake videos
  • Quality: High-quality with various compression levels
  • Download: FaceForensics++ repository
  • Usage: Additional training data for model robustness

πŸ–ΌοΈ Image-based Datasets

140k Real and Fake Faces

  • Description: Large collection of real and AI-generated face images
  • Size: ~140,000 images
  • Source: StyleGAN-generated faces vs real faces
  • Download: Kaggle Dataset
  • Usage: Perfect for image-based deepfake detection training

CelebA-HQ

  • Description: High-quality celebrity face dataset
  • Size: 30,000 high-resolution images
  • Quality: 1024Γ—1024 resolution
  • Download: GitHub Repository
  • Usage: Real face examples for training

πŸ”§ Dataset Preparation

Option 1: Download Pre-processed Datasets

  1. Download your chosen dataset from the links above
  2. Extract to the data/ folder
  3. Organize as shown in the training section below

Option 2: Use Dataset Preparation Tools

Use our built-in tools to prepare datasets. Edit the source/destination paths inside each script before running:

# Extract frames from videos (every 15th frame) and split into train/val
# Edit source & dest paths in the script, then run:
python tools/split_video_dataset.py

# Split an existing image dataset into 80/20 train/validation
# Edit source_dataset & destination paths in the script, then run:
python tools/split_train_val.py

# Extract frames from a single video directory
# Edit video_dir & output_dir in the script, then run:
python tools/split_dataset.py

πŸ“‹ Dataset Recommendations

  • For Beginners: Start with 140k Real and Fake Faces (image-based, easy to work with)
  • For Research: Use FaceForensics++ (comprehensive, multiple manipulation types)
  • For Production: Combine DFDC + Celeb-DF (large scale, diverse)
  • For High-Quality Testing: Use Celeb-DF v2 (challenging, high-quality deepfakes)

⚠️ Dataset Usage Notes

  • Ethical Use: These datasets are for research purposes only
  • Legal Compliance: Ensure compliance with dataset licenses and terms of use
  • Privacy: Respect privacy rights of individuals in the datasets
  • Citation: Properly cite the original dataset papers when publishing research

πŸ‹οΈ Training

Dataset Structure

Organize your training data in the data folder as follows:

data/
β”œβ”€β”€ train/
β”‚   β”œβ”€β”€ real/
β”‚   β”‚   β”œβ”€β”€ image1.jpg
β”‚   β”‚   └── image2.jpg
β”‚   └── fake/
β”‚       β”œβ”€β”€ fake1.jpg
β”‚       └── fake2.jpg
└── validation/
    β”œβ”€β”€ real/
    └── fake/

Configuration

Update config.yaml with your dataset paths:

train_paths:
  - data/train

val_paths:
  - data/validation

lr: 0.0001
batch_size: 4
num_epochs: 10

Start Training

python main_trainer.py

The training will:

  • Use PyTorch Lightning for efficient training
  • Save best model based on validation loss
  • Log metrics to TensorBoard
  • Apply early stopping to prevent overfitting

Monitor Training

View training progress with TensorBoard:

tensorboard --logdir lightning_logs

πŸ“ Project Structure

β”œβ”€β”€ web-app.py                    # Main web application
β”œβ”€β”€ main_trainer.py               # Primary training script
β”œβ”€β”€ classify.py                   # Image classification utility
β”œβ”€β”€ realeval.py                   # Real-world evaluation script
β”œβ”€β”€ config.yaml                   # Training configuration
β”œβ”€β”€ requirements.txt              # Python dependencies
β”œβ”€β”€ README.md                     # Project documentation
β”œβ”€β”€ ARCHITECTURE.md               # System architecture & design
β”œβ”€β”€ LICENSE                       # MIT License
β”œβ”€β”€ .gitignore                    # Git ignore rules
β”œβ”€β”€ data/                         # Dataset storage (not tracked by git)
β”‚   β”œβ”€β”€ train/                    # Training data
β”‚   └── validation/               # Validation data
β”œβ”€β”€ datasets/
β”‚   └── hybrid_loader.py          # Custom dataset loader
β”œβ”€β”€ lightning_modules/
β”‚   └── detector.py               # PyTorch Lightning module
β”œβ”€β”€ models/
β”‚   └── best_model-v3.pt          # Trained model weights
β”œβ”€β”€ tools/                        # Dataset preparation utilities
β”‚   β”œβ”€β”€ export_to_pt.py           # .ckpt β†’ .pt model converter
β”‚   β”œβ”€β”€ split_dataset.py          # Video frame extractor
β”‚   β”œβ”€β”€ split_train_val.py        # 80/20 train/val splitter
β”‚   └── split_video_dataset.py    # Video-aware dataset splitter
└── inference/
    β”œβ”€β”€ export_onnx.py            # ONNX export
    └── video_inference.py        # Multi-frame video inference

πŸ› οΈ Model Architecture

  • Backbone: EfficientNet-B0 (pre-trained on ImageNet)
  • Classifier: Custom 2-class classifier with dropout (0.4)
  • Input Size: 224x224 RGB images
  • Output: Binary classification (Real/Fake) with confidence scores

πŸ“Š Performance

  • Inference Speed: Real-time on GPU, ~200ms per image on CPU
  • Input Support: Images (.jpg, .png) and videos (.mp4, .mov)
  • Video Analysis: 10-frame uniform sampling with probability averaging
  • Robustness: Tested with Gaussian blur and JPEG compression noise simulation (realeval.py)

Note: Accuracy metrics depend on your training dataset. Monitor val_loss and val_acc via TensorBoard during training.

πŸ”§ Advanced Usage

Export to ONNX

Convert PyTorch model to ONNX format:

python inference/export_onnx.py

Batch Evaluation

Evaluate a folder of real-world samples with optional noise simulation:

# Place test images/videos in realworld_samples/ folder, then run:
python realeval.py

Export Checkpoint to PyTorch

Convert a Lightning .ckpt to a standalone .pt file:

# Edit ckpt_path and pt_output in the script, then run:
python tools/export_to_pt.py

🀝 Contributing

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

πŸ™ Acknowledgments

  • EfficientNet architecture by Google Research
  • PyTorch Lightning for training infrastructure
  • Gradio for web interface framework
  • The research community for deepfake detection advances

πŸ“„ License

This project is licensed under the MIT License.


⭐ Star this repository if you found it helpful!

Contributors

Languages