ResNet is an end-to-end deep learning pipeline featuring a Custom ResNet-50 architecture built entirely from scratch. Designed for ImageNet-100, this project demonstrates advanced distributed training dynamics, stochastic regularization, and model interpretability without relying on pre-trained ImageNet weights.
graph LR
%% Input and Stem
A["Input Image<br>3x224x224"] --> B("Stem<br>Conv 7x7, Stride 2")
B --> C("BatchNorm + ReLU")
C --> D("MaxPool 3x3")
%% Main Architecture Blocks (Bottlenecks)
D --> E["Layer 1<br>3x Bottleneck Blocks<br>Output Depth: 256"]
E --> F["Layer 2<br>4x Bottleneck Blocks<br>Output Depth: 512"]
F --> G["Layer 3<br>6x Bottleneck Blocks<br>Output Depth: 1024"]
G --> H["Layer 4<br>3x Bottleneck Blocks<br>Output Depth: 2048"]
%% Classification Head
H --> I("AdaptiveAvgPool2d")
I --> J("Flatten")
J --> K["Linear Projection<br>100 Classes"]
- Accuracy: Achieved 86.9% Top-1 and 96.2% Top-5 validation accuracy on the custom 100-class dataset.
- Ablation Superiority: Successfully outperformed warmed-up, industry-standard pre-trained baselines (PT ResNet-50, VGG-16, MobileNetV2) in a heavily controlled benchmarking environment.
- Performance: Engineered with
nn.DataParallelfor Dual NVIDIA T4 GPUs and Mixed Precision (AMP), reducing VRAM consumption by ~40%.
The 90-epoch learning curve demonstrates highly stable validation accuracy despite the aggressive noise introduced by CutMix and MixUp regularization.
Benchmarking the custom ResNet against ImageNet-pretrained industrial models (using a controlled 5-epoch classification head warmup).
Diagnostic heatmaps generated via forward/backward hooks on the final convolutional bottleneck (layer4[-1].conv3), proving the network isolates target morphology rather than background artifacts.
"Black box" models are unacceptable in production. I engineered a custom Gradient-weighted Class Activation Mapping (Grad-CAM) pipeline attached to the final convolutional bottleneck (layer4[-1].conv3) to actively debug model behavior and verify morphological feature extraction across varied target classes.
| Best of Class: Water Ouzel (Dipper) | Best of Class: Rock Crab |
![]() |
![]() |
| Success: Tench (100% Confidence) | Success: Tench (Alternate View) |
![]() |
![]() |
| Diagnostic: Texture Bias (Predicted 84) | Best of Class: Wombat |
![]() |
![]() |
Architectural Design
-
Native Topology Implementation : Bypassed torchvision.models to manually construct the AlgoReasoner ResNet-50 pipeline, including precise bottleneck triplets (1x1, 3x3, 1x1) and dynamically calculated projection shortcuts.
-
Kaiming (He) Initialization : Implemented mathematically rigorous weight initialization specifically designed for deep networks with asymmetrical non-linearities (ReLU), preventing vanishing/exploding gradients during the volatile early epochs of from-scratch training.
Advanced Regularization
-
Probabilistic CutMix & MixUp : Engineered a custom data-collator that dynamically applies CutMix (spatial patch blending) and MixUp (feature/label interpolation) with a randomized stochastic probability per batch. This forces the network to learn global structural cues rather than memorizing local pixel noise.
-
Optimization Schedule: Utilized a gradual learning rate warmup phase to stabilize initial high-entropy gradients, seamlessly transitioning into a Cosine Annealing (CosineAnnealingLR) decay schedule to gently guide the optimizer into narrow local minima.
MLOps, Profiling & Compute
-
Inference Profiling : Instrumented custom timing decorators to actively measure and log model throughput (images/sec) and raw inference latency (ms/image), ensuring the architecture remains viable for real-time production deployment.
-
Hardware Scaling & AMP : Orchestrated multi-GPU training via PyTorch nn.DataParallel (Dual NVIDIA T4s), heavily optimized with Automatic Mixed Precision (torch.amp.autocast) to reduce VRAM overhead by ~40% while accelerating tensor math.
-
Cloud-Native Fault Tolerance : Automated dynamic state-dictionary checkpointing (_latest.pth and _best.pth) directly within the training loop, ensuring zero data loss during preemptive cloud server disconnects.
Visual Interpretability
- "White-Box" Diagnostics (Grad-CAM) : Refused the "black-box" AI paradigm by engineering a custom Gradient-weighted Class Activation Mapping pipeline. By attaching forward and backward hooks to the final spatial feature map (layer4[-1].conv3), the model actively renders diagnostic heatmaps proving its predictions are grounded in actual target morphology.
1. Clone the repository
git clone [https://github.com/yourusername/ResNet-50.git](https://github.com/Asmit159/ResNet-50.git)
cd ResNet-502. Install dependencies
pip install -r requirements.txt3. Run the training pipeline
python train.py --epochs 90 --batch_size 128 --mixed_precision True4. Generate diagnostics (Grad-CAM & Ablation)
python evaluate.py --checkpoint weights/resnet50_best.pthAuthor Asmit Mandal








