This project addresses the challenge of improving neural network robustness against adversarial attacks, which can cause models to misclassify inputs with minimal perturbations. The primary problem is the trade-off between model accuracy on clean data and robustness to adversarial examples (Tsipras D. et al., 2019). The proposed solution is a hybrid training approach that combines both clean and adversarial (FGSM) data, aiming to maintain strong performance on unperturbed inputs while enhancing the model's resilience to progressively stronger adversarial (FGSM & PGD) attacks.
The Fast Gradient Sign Method (FGSM) attack is a type of adversarial attack on machine learning models, particularly neural networks. It works by adding small, carefully calculated perturbations to the input data in the direction of the gradient of the model's loss function, with respect to the input. This causes the model to misclassify the altered input, even though the perturbations are often imperceptible to humans. FGSM is widely used to evaluate the robustness of models against adversarial examples. This attack has been used in this project for both training and testing.
The Projected Gradient Descent (PGD) attack is an iterative adversarial attack used to evaluate the robustness of machine learning models. It builds on the FGSM attack by applying multiple small perturbations to the input data over several iterations. After each step, the perturbed input is projected back into a constrained space to ensure the perturbation remains within a specified limit. This makes PGD a stronger and more powerful attack compared to FGSM, as it refines the adversarial example over multiple steps. This attack has been used in this project only for both testing.
The project involve training and testing a ResNet-18 mode using three different methodologies:
- Experiment 1 (v1) - Model trained solely on normal, unperturbed data.
- Experiment 2 (v2) - Model trained exclusively on adversarial (FGSM) data.
- Experiment 3 (v3) - Model trained using a hybrid approach of both normal and adversarial (FSGM) data.
The model architecture is based on ResNet-18, which uses residual blocks with shortcut connections to enable efficient learning in deeper networks. Part (a) in the figure illustrates the stacked convolutional layers, with shortcut connections (red arrows) bypassing each block to prevent vanishing gradients. Part (b) shows the residual block structure, where the input ( X ) is added back to the transformed output ( F(X) ), resulting in ( F(X) + X ). This design allows for faster convergence and improved performance.
The project was carried out in MATLAB R2022a and the dataset used for training and validation is CIFAR-10.
To run these scripts, you need MATLAB R2022a or later installed, along with the following MATLAB toolboxes:
- Deep Learning Toolbox
- Parallel Computing Toolbox (for utilizing GPUs)
├── experiment_v1
│ ├── train_resnet_18_v1.m # Training script for Experiment 1
│ ├── test_normal_v1.m # Testing script on normal data (v1)
│ ├── test_fgsm_v1.m # Testing script against FGSM adversarial data (v1)
│ ├── test_pgd_v1.m # Testing script against PGD adversarial data (v1)
├── experiment_v2
│ ├── train_resnet_18_v2.m # Training script for Experiment 2
│ ├── test_normal_v2.m # Testing script on normal data (v2)
│ ├── test_fgsm_v2.m # Testing script against FGSM adversarial data (v2)
│ ├── test_pgd_v2.m # Testing script against PGD adversarial data (v2)
├── experiment_v3
│ ├── train_resnet_18_v3.m # Training script for Experiment 3
│ ├── test_normal_v3.m # Testing script on normal data (v3)
│ ├── test_fgsm_v3.m # Testing script against FGSM adversarial data (v3)
│ ├── test_pgd_v3.m # Testing script against PGD adversarial data (v3)
├── pytorch_conversion
│ ├── export_resnet18_to_onnx.m # Exporting weights and architecture of models to ONNX runtime
│ ├── convert_onnx_to_pytorch.py # Exporting ONNX runtime format to PyTorch
└── README.md # This README fileThroughout all the four experiments, LearnRate = 0.01, miniBatchSize = 128 and MaxEpoch = 100 have been maintained for consistency.
In all the experiments, training data set has been scaled up from 32x32x3 to 224x224x3, randomly shifted vertically and horizontally by up to 4 pixels and flipped using imageDataAugmenter before being passed on to miniBatchQueue.
Objective: To establish a baseline accuracy of ResNet-18 on clean CIFAR-10 data.
- Model Name:
resnet_18_v1.mat - Results:
- Validation Accuracy on Normal Data: 91.58%
- Validation Accuracy under FGSM Attack: Varies between 61.31% and 16.84% depending on the strength of
epsilon. - Validation Accuracy under PGD Attack: 0.57% (
epsilon=alpha= 8, iteration = 30)
This experiment serves as a control to evaluate the impact of adversarial training on model performance.
Objective: To test the robustness of ResNet-18 when trained entirely on adversarial examples.
- Model Name:
resnet_18_v2.mat - Training Data: 100% adversarial data generated using FGSM (
epsilon = 2,alpha = 2, iteration step = 1). - Results:
- Validation Accuracy on Normal Data: 85.81%
- Validation Accuracy under FGSM Attack: Varies between 76.00% and 46.01%.
- Validation Accuracy under PGD Attack: 38.75% (
epsilon=alpha= 8, iteration = 30)
The training was performed using exclusively adversarial data (FGSM), aiming to create a model robust to this specific attack.
Objective: To balance normal validation accuracy with adversarial robustness by training on both clean and adversarial data.
- Model Name:
resnet_18_v3.mat - Training Data:
- First 50 epochs on clean data.
- Next 50 epochs on adversarial data generated using FGSM (
epsilon = 2,alpha = epsilon, iteration step = 1).
- Results:
- Validation Accuracy on Normal Data: 88.75%
- Validation Accuracy under FGSM Attack: Varies between 77.30% and 43.78%.
- Validation Accuracy under PGD Attack: 35.66% (
epsilon=alpha= 8, iteration = 30)
The objective was to train a model robust against a strong PGD attack, even at the cost of reduced accuracy on clean data.
To train the models for each experiment update datadir to the desired folder destination where CIFAR-10 dataset will be downloaded, navigate to the corresponding folder and run the training script. For example, to train the model for Experiment 1 (v1), use:
cd experiment_v1
train_resnet_18_v1.mThis script will train the ResNet-18 model on the CIFAR-10 dataset. The trained model will be saved as resnet_18_v1.mat.
The ResNet-18 models in this project were originally trained using MATLAB Deep Learning Toolbox. To enable cross-platform inference and integration with modern ML workflows, the trained models are converted to PyTorch using an ONNX-based pipeline.
MATLAB (.mat) → ONNX (.onnx) → PyTorch (.pth)
Run the MATLAB script:
export_resnet18_to_onnx.mThis generates ONNX files for all trained models.
Convert ONNX to PyTorch Install dependencies:
pip install torch torchvision onnx onnxruntime onnx2pytorchRun the conversion script:
python convert_onnx_to_pytorch.pyThis produces PyTorch .pth files for inference.
To test the model on normal, unperturbed CIFAR-10 data, use the following script after training:
test_normal_v1.mThis will output the validation accuracy on clean data for the model.
-
FGSM Testing: To test the model against adversarial data generated using FGSM, run:
test_fgsm_v1.m
-
PGD Testing: To test the model against adversarial data generated using PGD, run:
test_pgd_v1.m
Replace v1 with v2 or v3 for the other experiments. These scripts will calculate and display the model's accuracy under different adversarial perturbation strengths.
You can modify the strength of the adversarial perturbations epsilon in the testing scripts. For FGSM, the value of epsilon can be adjusted directly in the code:
epsilon = 8; % Set the perturbation strength for FGSMSimilarly, for PGD, the number of iterations and the step size can be adjusted:
epsilon = 8; % Maximum allowed perturbation
alpha = 0.01; % Step size for each iteration
num_iterations = 40; % Number of iterations for PGD
Validation accuracy of all models on normal data.
Validation accuracy of all models on progressively stronger adversarial (FGSM) data.
Validation accuracy of all models on adversarial (PGD) data.
Grad-CAM output for all models
-
Trade-off Between Accuracy and Robustness:
- Models trained exclusively on normal data (Experiment 1) achieve high accuracy on clean inputs but are highly vulnerable to adversarial attacks, while models trained on adversarial data (Experiment 2) exhibit greater robustness but lower accuracy on clean data.
-
Hybrid Training Provides a Balanced Solution:
- The hybrid training approach (Experiment 3), which uses both clean and adversarial data, strikes a balance between accuracy and robustness, improving performance on adversarial examples while maintaining respectable accuracy on clean data.
-
Stronger Adversarial Training Reduces Clean Data Performance:
- As adversarial perturbations become stronger (higher epsilon values), models trained purely on adversarial data show a steady decline in performance on clean data, reinforcing the need for a balanced training strategy.
-
Model Resilience Against PGD Attacks:
- While models trained on normal data performed poorly under the PGD attack, the adversarially-trained and hybrid-trained models exhibited significantly better resilience, highlighting the effectiveness of adversarial training against stronger attacks.
-
Visualization Insights Through Grad-CAM:
- The Grad-CAM visualizations revealed that models trained with adversarial data (Experiments 2 and 3) focused on broader and more relevant regions of the input, suggesting that adversarial training enables models to learn more robust and meaningful feature representations.
NOTE: The detailed analysis can be found in the Project_Report.pdf file.
- Explaining and Harnessing Adversarial Examples by Ian Goodfellow et al. (2015)
- Towards Deep Learning Models Resistant to Adversarial Attacks by Aleksander Madry et al. (2018)
- Robustness may be at odds with accuracy by Dimitris Tsipras et al. (2019)
- Compress Image Classification Network for Deployment to Resource-Constrained Embedded Devices - MATLAB & Simulink - MathWorks United Kingdom
- Train Image Classification Network Robust to Adversarial Examples - MATLAB & Simulink - MathWorks United Kingdom
- Grad-CAM Reveals the Why Behind Deep Learning Decisions - MATLAB & Simulink - MathWorks United Kingdom
Special thanks to @luisacutillo78 & @mikecroucher for their valuable feedback, guidance and support.
This work was undertaken on ARC4, part of the High Performance Computing facilities at the University of Leeds, UK.
For any questions or feedback, please contact Rajarshi Nandi at mm23rn.leeds.ac.uk.

