The demand for high-resolution images is crucial in computer vision applications for improved analysis and pattern recognition. However, obtaining high-resolution images can be expensive and challenging due to equipment limitations and sensor constraints. To address this issue, digital image and video super-resolution methods have been developed to enhance low-resolution images, enabling numerous applications in various fields, such as surveillance, medical image processing, forensics, and satellite imaging.
This thesis aims to enhance and build upon existing image super-resolution methods by utilizing feature maps from the renowned VGG networks and incorporating Transformer Networks within an SRGAN model. The inclusion of Transformer Networks helps combat spatial transformation invariance, such as translation, scaling, rotation, cropping, and non-rigid deformations, resulting in improved performance.
The proposed model achieves highly satisfactory and accurate results in terms of image PSNR (Peak Signal-to-Noise Ratio) and SSIM (Structural Similarity Index) metrics when compared to existing baseline methods. The evaluation is conducted on the commonly used datasets Set5 and Set14 from the baseline SRGAN paper, as well as the publicly available CelebFaces Attributes Dataset (CelebA), which contains over 200K celebrity images, each annotated with 40 attributes.
This repository is currently comprised of but a few pages of my thesis. Code is not publicly available as my work is currently under academic review.




