Understanding Adversarial Attacks Through MNIST

Hands-On Tutorial: Building CNNs and Testing Adversarial Robustness with PyTorch

Overview

This repository demonstrates the fragility of deep learning models through hands-on experimentation with adversarial attacks. You'll build a CNN from scratch, train it to 98% accuracy on MNIST, then watch it collapse to 41% accuracy when faced with imperceptible adversarial perturbations.

What makes neural networks so vulnerable? Decision boundaries learned during training are often extremely close to legitimate examples. A tiny push in the right direction—changing pixels by amounts invisible to humans—can flip predictions completely. This isn't a theoretical curiosity. It's a critical security issue affecting autonomous vehicles, medical imaging systems, facial recognition, and fraud detection.

What Makes This Tutorial Unique

This tutorial specifically focuses on Convolutional Neural Networks (CNNs) for adversarial robustness—not just applying FGSM to pre-trained models:

Build CNNs from scratch - Learn conv layers, pooling, and architecture design, not just use them
Understand spatial hierarchies - See how feature maps at each layer respond to adversarial perturbations
Monitor CNN internals - Use PyTorch forward hooks to inspect activations throughout the network
Complete training pipeline - Master DataLoaders, optimization, and proper evaluation
Educational focus - 8,000-word companion article explaining real-world security implications
Visual intuition - Detailed visualizations showing how convolutional features break down under attack

While FGSM is a general attack method, this tutorial teaches you how CNNs specifically learn spatial patterns and fail under adversarial conditions. This makes it ideal for computer vision practitioners who need to understand both architecture design and security testing.

What You'll Learn

PyTorch Fundamentals

Build CNNs using torch.nn.Module
Implement proper training loops with backpropagation
Use DataLoaders for efficient batching and preprocessing
Monitor model internals with forward hooks
Save and load model checkpoints

Security & Robustness

Generate adversarial examples using Fast Gradient Sign Method (FGSM)
Measure the gap between clean accuracy and adversarial robustness
Visualize imperceptible perturbations that fool neural networks
Understand why 98% test accuracy doesn't mean production-ready
Learn defense strategies and their limitations

Quick Start

Installation

# Clone the repository
git clone https://github.com/scthornton/understanding-adversarial-attacks-mnist.git
cd understanding-adversarial-attacks-mnist

# Install dependencies
pip install torch torchvision matplotlib numpy jupyter

Run the Tutorial

# Launch Jupyter notebook
jupyter notebook adversarial-mnist-tutorial.ipynb

The notebook is self-contained and walks through every step with detailed explanations.

Key Results

When we apply FGSM attacks with different epsilon values:

Epsilon	Perturbation Visibility	Accuracy	Interpretation
0.00	No attack (clean)	98.45%	Baseline performance
0.05	Barely detectable	95.32%	Small degradation
0.10	Still invisible to humans	82.14%	Significant vulnerability
0.20	Slightly noticeable	41.23%	Worse than random guessing
0.30	Visible noise	18.67%	Model is broken

Critical Insight: At epsilon 0.1 (roughly 3% of the normalized pixel range), perturbations remain invisible to humans but reduce accuracy by 16 percentage points. The model that appeared production-ready based on test accuracy collapses under adversarial conditions.

Repository Structure

understanding-adversarial-attacks-mnist/
├── README.md                              # This file
├── adversarial-mnist-tutorial.ipynb       # Main educational notebook
├── adversarial-mnist-article-edited.md    # Deep-dive article on adversarial attacks
├── requirements.txt                       # Python dependencies
├── LICENSE                                # MIT License
└── images/                                # Visualizations and results
    ├── architecture-diagram.png
    ├── adversarial-examples.png
    └── robustness-curve.png

Notebook Contents

Part 1: Setup and Architecture

Import libraries and set up environment
Build a simple CNN from scratch
Understand the architecture flow and parameter counts

Part 2: Data Loading

Load and preprocess MNIST dataset
Create efficient DataLoaders with batching and shuffling
Visualize training samples

Part 3: Training Pipeline

Implement training loop with proper gradient handling
Track loss and accuracy over epochs
Evaluate on test set to measure generalization

Part 4: Monitoring with Hooks

Use PyTorch forward hooks to inspect activations
Monitor internal representations during inference
Debug and understand what the network learns

Part 5: Adversarial Attacks

Implement Fast Gradient Sign Method (FGSM)
Test robustness across different epsilon values
Visualize clean images, perturbations, and adversarial examples

Part 6: Security Analysis

Understand the gap between clean and adversarial accuracy
Learn defense strategies (adversarial training, certified defenses)
Best practices for deploying models in adversarial environments

Companion Article

The repository includes adversarial-mnist-article-edited.md, a comprehensive 8,000-word article that covers:

Why adversarial vulnerabilities matter beyond MNIST
Real-world incidents (autonomous vehicles, facial recognition, medical imaging)
Current defense strategies and their limitations
Practical guidance for ML engineers deploying models in production
The path forward for secure machine learning

Real-World Implications

Autonomous Vehicles

Adversarial patches on stop signs can make object detection models fail to recognize them. Researchers demonstrated this using physical stickers arranged in computed patterns—the vehicle's camera captures the scene, and the neural network misclassifies critical road signs.

Medical Imaging

A 2019 study showed adversarial perturbations that make malignant tumors invisible to cancer detection networks while making benign tissue appear malignant. The perturbations are subtle enough that radiologists don't notice them.

Facial Recognition

Specially designed eyeglass frames or adversarial makeup patterns can bypass security checkpoints, evade recognition, or impersonate other individuals. Every access control system relying on facial recognition carries this vulnerability.

Financial Fraud Detection

Transaction patterns that should trigger fraud alerts can be perturbed slightly—changing amounts, timing, or account relationships—to preserve fraudulent intent while evading detection.

Defense Strategies

The notebook covers several defense approaches:

Adversarial Training: Include adversarial examples in training data (most effective but reduces clean accuracy)
Defensive Distillation: Smooth decision boundaries with softmax temperature (bypassed by strong attacks)
Input Preprocessing: Apply transformations to remove perturbations (defeated by adaptive attackers)
Certified Defenses: Provide provable robustness guarantees (severe accuracy/performance tradeoffs)
Detection Methods: Identify adversarial inputs before classification (evaded by adaptive attacks)

Key Takeaway: No defense provides perfect protection. Production systems require defense in depth—layered security combining multiple independent mechanisms.

Prerequisites

Required Knowledge:

Basic Python programming
Fundamental understanding of neural networks
Familiarity with NumPy

Recommended (but not required):

Prior PyTorch experience
Calculus (for understanding gradients)
Linear algebra (for understanding matrix operations)

Time Commitment:

Complete tutorial: 2-3 hours
Reading companion article: 30 minutes
Exercises and experimentation: 1-2 hours

Exercises for Further Learning

Beginner

Add learning rate scheduling with torch.optim.lr_scheduler.StepLR
Plot training curves (loss and accuracy over epochs)
Experiment with different batch sizes and observe the impact

Intermediate

Add batch normalization after convolutional layers
Compare SGD vs Adam vs AdamW optimizers
Implement PGD attack (multi-step iterative FGSM)

Advanced

Implement adversarial training (train on mix of clean and adversarial examples)
Export model to ONNX format for deployment
Create a robustness curve plotting accuracy vs epsilon for multiple attack methods

Contributing

Contributions are welcome! This is an educational project focused on teaching adversarial machine learning through practical examples.

Ways to contribute:

Report bugs or unclear explanations
Suggest additional exercises or examples
Add new attack implementations (PGD, C&W, adversarial patches)
Improve visualizations
Add defense implementations

Please open an issue first to discuss proposed changes.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Citation

If you use this tutorial in your research or teaching, please cite:

@misc{thornton2025adversarial,
  author = {Thornton, Scott},
  title = {Understanding Adversarial Attacks Through MNIST},
  year = {2025},
  publisher = {GitHub},
  url = {https://github.com/scthornton/understanding-adversarial-attacks-mnist}
}

Resources

Papers

Explaining and Harnessing Adversarial Examples (FGSM) - Goodfellow et al., 2014
Towards Deep Learning Models Resistant to Adversarial Attacks (PGD) - Madry et al., 2017
Certified Adversarial Robustness via Randomized Smoothing - Cohen et al., 2019

Tools & Benchmarks

RobustBench - Adversarial robustness benchmark
CleverHans - Adversarial attack library
Foolbox - Python toolbox for adversarial attacks

Documentation

Acknowledgments

MNIST Dataset: LeCun, Cortes, and Burges
FGSM Attack: Ian Goodfellow and colleagues (2014)
PyTorch Framework: Facebook AI Research
Inspired by the need to understand adversarial vulnerabilities before deploying ML systems in production

Contact

Scott Thornton

Website: perfecxion.ai
GitHub: @scthornton
LinkedIn: Scott Thornton

⚠️ Security Disclaimer: This tutorial is for educational purposes. Understanding adversarial attacks is critical for building secure ML systems. Always test robustness before deploying models in security-critical applications.

⭐ If you found this tutorial helpful, please consider giving it a star!

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
adversarial-mnist-article-edited.md		adversarial-mnist-article-edited.md
adversarial-mnist-tutorial.ipynb		adversarial-mnist-tutorial.ipynb
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Understanding Adversarial Attacks Through MNIST

Overview

What Makes This Tutorial Unique

What You'll Learn

PyTorch Fundamentals

Security & Robustness

Quick Start

Installation

Run the Tutorial

Key Results

Repository Structure

Notebook Contents

Part 1: Setup and Architecture

Part 2: Data Loading

Part 3: Training Pipeline

Part 4: Monitoring with Hooks

Part 5: Adversarial Attacks

Part 6: Security Analysis

Companion Article

Real-World Implications

Autonomous Vehicles

Medical Imaging

Facial Recognition

Financial Fraud Detection

Defense Strategies

Prerequisites

Exercises for Further Learning

Beginner

Intermediate

Advanced

Contributing

License

Citation

Resources

Papers

Tools & Benchmarks

Documentation

Acknowledgments

Contact

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages