Modified GAN-U-Net for Enhanced Image Colorization and Restoring Old Photos

This project presents a research paper that investigates the effectiveness of modified U-Net architectures for AI-driven image colorization and the restoration of historical black-and-white photographs. The study leverages deep learning models, specifically U-Net and Generative Adversarial Networks (GANs), to evaluate enhanced architectures in generating realistic and detailed colorized images.

Publication

This paper has been officially published and peer-reviewed on IEEE Xplore:

👉 https://ieeexplore.ieee.org/document/11413811

Abstract

In today's digital age, AI-driven image colorization has gained significant attention, particularly for restoring historical black-and-white photographs. This study investigates the effectiveness of modified U-Net architectures, enhanced with multi-attention mechanisms and pre-trained embeddings, in improving the quality and accuracy of image colorization. We compare U-Net models with and without GAN integration, providing insights into the impact of generative techniques on colorization outcomes. The research found that the Plain U-Net model achieved the best performance with a PSNR score of 24.198 and an SSIM of 0.9153, outperforming both attention-based and GAN-integrated architectures. The inclusion of GANs, while theoretically beneficial, did not guarantee improved performance and resulted in a slightly lower quantitative outcome.

Keywords: Deep Learning, Generative Adversarial Network (GAN), Image Colorization, Image Restoration, U-Net Attention

Proposed Method

Research Design

This research uses a quantitative and qualitative experimental design to evaluate how different variations of U-Net impact the image colorization process. A total of six models were implemented, divided into two groups:

Group A (Non-GAN-enhanced): Plain U-Net, U-Net + MobileNetV3, and U-Net + Multi-Attention.
Group B (GAN-integrated): The same three architectures with the addition of a PatchGAN discriminator.

Dataset Collection

The project utilizes 10,000 images from the COCO Dataset (Common Objects in Context). The dataset is split with an 8:1:1 ratio for training, testing, and validation, respectively:

8,000 images for training
1,000 images for testing
1,000 images for validation

Pre-Processing Data

All images are resized to 128x128 pixels (RGB) and undergo image augmentation with random flipping. The images are then converted to the LAB color space, which is widely used for image colorization due to its perceptual uniformity. The L channel represents grayscale structure, while the A and B channels capture color information, allowing the model to learn colorization patterns independently.

AI Model Architecture

The core model is the U-Net, featuring an encoder–decoder structure and skip connections to preserve spatial details.
Enhancements include:

MobileNetV3 encoder for efficiency
Multi-Attention Modules (channel + spatial) to focus on key features
GAN integration with PatchGAN discriminator

Click to view model architectures

Plain U-Net

U-Net + GAN

MAU-Net + MobileNetV3 + GAN

MAU-Net + GAN

MAU-Net

Training and Evaluation

Experimental Setup

Software: Python with PyTorch and Visual Studio Code (VSCode).
Hardware: The models were trained on three separate devices with different CPUs (Intel Core i7-8750H, AMD Ryzen 5 3500U, and Apple M2 Pro chip), as GPUs were not available.

Training Parameters

Epochs: 25
Batch Size: 16
Optimizer: Adam with a learning rate of 0.0001

Metrics

Structural Similarity Index Measure (SSIM): Measures the structural information, luminance, and contrast between two images. A value closer to 1 indicates a perfect match.
Peak Signal-to-Noise Ratio (PSNR): Evaluates image quality by comparing the maximum signal power to the power of corrupting noise. A higher PSNR value indicates less distortion and better colorization.

Results and Discussion

The GAN-integrated models took three to four times longer to train than the non-GAN versions due to the heavier computational load.

Model	PSNR	SSIM
Plain U-Net	24.198	0.9153
+ MobileNetV3	23.7627	0.9117
+ Multi-Attention	23.411	0.9128
U-Net + GAN	23.215	0.9002
U-Net + MobileNetV3 + GAN	21.746	0.8811
U-Net + Multi-Attention + GAN	20.7304	0.9149

Based on the results, the Plain U-Net model had the best performance with an overall PSNR of 24.198 and SSIM of 0.9153, outperforming the other models. Visually, the outputs from the GAN-integrated models were nearly identical to their non-GAN counterparts, suggesting that GAN integration did not provide a significant improvement in this study.

Example Results

Plain U-Net

U-Net + MobileNetV3

U-Net + Multi-Attention

U-Net + MobileNetV3 + GAN

U-Net + Multi-Attention + GAN

Note

This repository is public and intended for research and reference purposes. The paper is no longer under review and has been officially published. Please refer to the IEEE version for the final validated results.

Authors

Olivia Putri: olivia.putri001@binus.ac.id
Emily Wilkinson: emily.wilkinson@binus.ac.id
Liona Loren: liona.loren@binus.ac.id

Guidance: Nikita Ananda Putri Masaling & Andry Chowanda

License

This project is licensed under MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
Unet_multi_attention_GAN_result		Unet_multi_attention_GAN_result
data/COCO_Dataset		data/COCO_Dataset
images_model		images_model
models		models
plain_unet_result		plain_unet_result
unet_mobilenetv3_gan_result		unet_mobilenetv3_gan_result
unet_mobilenetv3_result		unet_mobilenetv3_result
unet_multi_attention_result		unet_multi_attention_result
LICENSE		LICENSE
README.md		README.md
datasets.py		datasets.py
images		images
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Modified GAN-U-Net for Enhanced Image Colorization and Restoring Old Photos

Publication

Abstract

Proposed Method

Research Design

Dataset Collection

Pre-Processing Data

AI Model Architecture

Plain U-Net

U-Net + GAN

MAU-Net + MobileNetV3 + GAN

MAU-Net + GAN

MAU-Net

Training and Evaluation

Experimental Setup

Training Parameters

Metrics

Results and Discussion

Example Results

Plain U-Net

U-Net + MobileNetV3

U-Net + Multi-Attention

U-Net + MobileNetV3 + GAN

U-Net + Multi-Attention + GAN

Note

Authors

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Modified GAN-U-Net for Enhanced Image Colorization and Restoring Old Photos

Publication

Abstract

Proposed Method

Research Design

Dataset Collection

Pre-Processing Data

AI Model Architecture

Plain U-Net

U-Net + GAN

MAU-Net + MobileNetV3 + GAN

MAU-Net + GAN

MAU-Net

Training and Evaluation

Experimental Setup

Training Parameters

Metrics

Results and Discussion

Example Results

Plain U-Net

U-Net + MobileNetV3

U-Net + Multi-Attention

U-Net + MobileNetV3 + GAN

U-Net + Multi-Attention + GAN

Note

Authors

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages