Skip to content

Wilkinsonemily/Modified-GAN-U-Net-for-Enhanced-Image-Colorization-and-Restoring-Old-Photos

Repository files navigation

Modified GAN-U-Net for Enhanced Image Colorization and Restoring Old Photos

This project presents a research paper that investigates the effectiveness of modified U-Net architectures for AI-driven image colorization and the restoration of historical black-and-white photographs. The study leverages deep learning models, specifically U-Net and Generative Adversarial Networks (GANs), to evaluate enhanced architectures in generating realistic and detailed colorized images.


Publication

This paper has been officially published and peer-reviewed on IEEE Xplore:

👉 https://ieeexplore.ieee.org/document/11413811


Abstract

In today's digital age, AI-driven image colorization has gained significant attention, particularly for restoring historical black-and-white photographs. This study investigates the effectiveness of modified U-Net architectures, enhanced with multi-attention mechanisms and pre-trained embeddings, in improving the quality and accuracy of image colorization. We compare U-Net models with and without GAN integration, providing insights into the impact of generative techniques on colorization outcomes. The research found that the Plain U-Net model achieved the best performance with a PSNR score of 24.198 and an SSIM of 0.9153, outperforming both attention-based and GAN-integrated architectures. The inclusion of GANs, while theoretically beneficial, did not guarantee improved performance and resulted in a slightly lower quantitative outcome.

Keywords: Deep Learning, Generative Adversarial Network (GAN), Image Colorization, Image Restoration, U-Net Attention


Proposed Method

Research Design

This research uses a quantitative and qualitative experimental design to evaluate how different variations of U-Net impact the image colorization process. A total of six models were implemented, divided into two groups:

  • Group A (Non-GAN-enhanced): Plain U-Net, U-Net + MobileNetV3, and U-Net + Multi-Attention.
  • Group B (GAN-integrated): The same three architectures with the addition of a PatchGAN discriminator.

Dataset Collection

The project utilizes 10,000 images from the COCO Dataset (Common Objects in Context). The dataset is split with an 8:1:1 ratio for training, testing, and validation, respectively:

  • 8,000 images for training
  • 1,000 images for testing
  • 1,000 images for validation

Pre-Processing Data

All images are resized to 128x128 pixels (RGB) and undergo image augmentation with random flipping. The images are then converted to the LAB color space, which is widely used for image colorization due to its perceptual uniformity. The L channel represents grayscale structure, while the A and B channels capture color information, allowing the model to learn colorization patterns independently.


AI Model Architecture

The core model is the U-Net, featuring an encoder–decoder structure and skip connections to preserve spatial details.
Enhancements include:

  • MobileNetV3 encoder for efficiency
  • Multi-Attention Modules (channel + spatial) to focus on key features
  • GAN integration with PatchGAN discriminator
Click to view model architectures

Plain U-Net

Plain U-Net

U-Net + GAN

U-Net + GAN

MAU-Net + MobileNetV3 + GAN

MAU-Net + MobileNetV3 + GAN

MAU-Net + GAN

MAU-Net + GAN

MAU-Net

MAU-Net


Training and Evaluation

Experimental Setup

  • Software: Python with PyTorch and Visual Studio Code (VSCode).
  • Hardware: The models were trained on three separate devices with different CPUs (Intel Core i7-8750H, AMD Ryzen 5 3500U, and Apple M2 Pro chip), as GPUs were not available.

Training Parameters

  • Epochs: 25
  • Batch Size: 16
  • Optimizer: Adam with a learning rate of 0.0001

Metrics

  • Structural Similarity Index Measure (SSIM): Measures the structural information, luminance, and contrast between two images. A value closer to 1 indicates a perfect match.
  • Peak Signal-to-Noise Ratio (PSNR): Evaluates image quality by comparing the maximum signal power to the power of corrupting noise. A higher PSNR value indicates less distortion and better colorization.

Results and Discussion

The GAN-integrated models took three to four times longer to train than the non-GAN versions due to the heavier computational load.

Model PSNR SSIM
Plain U-Net 24.198 0.9153
+ MobileNetV3 23.7627 0.9117
+ Multi-Attention 23.411 0.9128
U-Net + GAN 23.215 0.9002
U-Net + MobileNetV3 + GAN 21.746 0.8811
U-Net + Multi-Attention + GAN 20.7304 0.9149

Based on the results, the Plain U-Net model had the best performance with an overall PSNR of 24.198 and SSIM of 0.9153, outperforming the other models. Visually, the outputs from the GAN-integrated models were nearly identical to their non-GAN counterparts, suggesting that GAN integration did not provide a significant improvement in this study.


Example Results

Plain U-Net

Plain U-Net Result

U-Net + MobileNetV3

U-Net + MobileNetV3 Result

U-Net + Multi-Attention

U-Net + Multi-Attention Result

U-Net + MobileNetV3 + GAN

U-Net + MobileNetV3 + GAN Result

U-Net + Multi-Attention + GAN

U-Net + Multi-Attention + GAN Result


Note

This repository is public and intended for research and reference purposes. The paper is no longer under review and has been officially published. Please refer to the IEEE version for the final validated results.


Authors

Guidance: Nikita Ananda Putri Masaling & Andry Chowanda


License

This project is licensed under MIT License.

About

Modified GAN-U-Net for AI-driven image colorization and restoration of historical photos. Paper under review.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages