This project presents a research paper that investigates the effectiveness of modified U-Net architectures for AI-driven image colorization and the restoration of historical black-and-white photographs. The study leverages deep learning models, specifically U-Net and Generative Adversarial Networks (GANs), to evaluate enhanced architectures in generating realistic and detailed colorized images.
This paper has been officially published and peer-reviewed on IEEE Xplore:
👉 https://ieeexplore.ieee.org/document/11413811
In today's digital age, AI-driven image colorization has gained significant attention, particularly for restoring historical black-and-white photographs. This study investigates the effectiveness of modified U-Net architectures, enhanced with multi-attention mechanisms and pre-trained embeddings, in improving the quality and accuracy of image colorization. We compare U-Net models with and without GAN integration, providing insights into the impact of generative techniques on colorization outcomes. The research found that the Plain U-Net model achieved the best performance with a PSNR score of 24.198 and an SSIM of 0.9153, outperforming both attention-based and GAN-integrated architectures. The inclusion of GANs, while theoretically beneficial, did not guarantee improved performance and resulted in a slightly lower quantitative outcome.
Keywords: Deep Learning, Generative Adversarial Network (GAN), Image Colorization, Image Restoration, U-Net Attention
This research uses a quantitative and qualitative experimental design to evaluate how different variations of U-Net impact the image colorization process. A total of six models were implemented, divided into two groups:
- Group A (Non-GAN-enhanced): Plain U-Net, U-Net + MobileNetV3, and U-Net + Multi-Attention.
- Group B (GAN-integrated): The same three architectures with the addition of a PatchGAN discriminator.
The project utilizes 10,000 images from the COCO Dataset (Common Objects in Context). The dataset is split with an 8:1:1 ratio for training, testing, and validation, respectively:
- 8,000 images for training
- 1,000 images for testing
- 1,000 images for validation
All images are resized to 128x128 pixels (RGB) and undergo image augmentation with random flipping. The images are then converted to the LAB color space, which is widely used for image colorization due to its perceptual uniformity. The L channel represents grayscale structure, while the A and B channels capture color information, allowing the model to learn colorization patterns independently.
The core model is the U-Net, featuring an encoder–decoder structure and skip connections to preserve spatial details.
Enhancements include:
- MobileNetV3 encoder for efficiency
- Multi-Attention Modules (channel + spatial) to focus on key features
- GAN integration with PatchGAN discriminator
Click to view model architectures
- Software: Python with PyTorch and Visual Studio Code (VSCode).
- Hardware: The models were trained on three separate devices with different CPUs (Intel Core i7-8750H, AMD Ryzen 5 3500U, and Apple M2 Pro chip), as GPUs were not available.
- Epochs: 25
- Batch Size: 16
- Optimizer: Adam with a learning rate of 0.0001
- Structural Similarity Index Measure (SSIM): Measures the structural information, luminance, and contrast between two images. A value closer to 1 indicates a perfect match.
- Peak Signal-to-Noise Ratio (PSNR): Evaluates image quality by comparing the maximum signal power to the power of corrupting noise. A higher PSNR value indicates less distortion and better colorization.
The GAN-integrated models took three to four times longer to train than the non-GAN versions due to the heavier computational load.
| Model | PSNR | SSIM |
|---|---|---|
| Plain U-Net | 24.198 | 0.9153 |
| + MobileNetV3 | 23.7627 | 0.9117 |
| + Multi-Attention | 23.411 | 0.9128 |
| U-Net + GAN | 23.215 | 0.9002 |
| U-Net + MobileNetV3 + GAN | 21.746 | 0.8811 |
| U-Net + Multi-Attention + GAN | 20.7304 | 0.9149 |
Based on the results, the Plain U-Net model had the best performance with an overall PSNR of 24.198 and SSIM of 0.9153, outperforming the other models. Visually, the outputs from the GAN-integrated models were nearly identical to their non-GAN counterparts, suggesting that GAN integration did not provide a significant improvement in this study.
This repository is public and intended for research and reference purposes. The paper is no longer under review and has been officially published. Please refer to the IEEE version for the final validated results.
- Olivia Putri: olivia.putri001@binus.ac.id
- Emily Wilkinson: emily.wilkinson@binus.ac.id
- Liona Loren: liona.loren@binus.ac.id
Guidance: Nikita Ananda Putri Masaling & Andry Chowanda
This project is licensed under MIT License.









