Skip to content

ranimeshehata/Augmented-Reality-and-Image-Mosaics

Repository files navigation

Augmented Reality & Image Mosaics

A comprehensive implementation of augmented reality video overlay and image panorama stitching using SIFT feature detection, planar homography estimation with RANSAC, and advanced warping techniques.

๐Ÿ“‹ Overview

This project implements two major computer vision applications:

Part 1: Augmented Reality with Planar Homographies

Real-time AR video overlay that replaces a book cover in a video with custom AR content using:

  • Planar Tracking: Detect and track a book cover across video frames
  • Homography-based Warping: cv2.warpPerspective for AR content projection
  • Mask-based Compositing: Polygon masking for seamless overlay
  • Video Processing: Frame-by-frame processing with audio preservation

Part 2: Image Mosaics & Panorama Stitching

Create wide panoramic images from overlapping photographs using:

  • Feature Detection & Matching: SIFT keypoint detection with Lowe's ratio test
  • Homography Estimation: Direct Linear Transform (DLT) using SVD
  • Robust Estimation: RANSAC for outlier rejection
  • Image Warping: Inverse warping with bilinear interpolation
  • Multi-Image Stitching: Sequential stitching for 3+ images

๐ŸŽฏ Features

Part 1: Augmented Reality Application

  • โœ… Real-time book cover detection and tracking
  • โœ… SIFT-based feature matching between book cover and video frames
  • โœ… Frame-by-frame homography computation with RANSAC
  • โœ… Perspective warping using cv2.warpPerspective
  • โœ… Polygon-based masking for seamless AR overlay
  • โœ… Aspect ratio-aware cropping and resizing
  • โœ… Video generation with original audio preservation
  • โœ… Support for different AR source videos

Part 2: Image Mosaics & Panoramas

  • โœ… SIFT-based feature detection and matching
  • โœ… Custom DLT homography estimation (no OpenCV homography functions)
  • โœ… RANSAC implementation for robust homography computation
  • โœ… Bilinear interpolation for sub-pixel accuracy
  • โœ… Backward warping to avoid holes in output
  • โœ… Homography verification with visual point mapping
  • โœ… Support for 2-image and 3-image panoramas

Implementation Highlights

  • No built-in homography functions (Part 2): All homography computation done from scratch using SVD
  • Efficient warping: Inverse warping ensures every output pixel has a value
  • Quality interpolation: Bilinear interpolation for smooth results
  • Robust matching: Lowe's ratio test (0.75) + RANSAC (5-pixel threshold)
  • Production-ready AR: Full video processing pipeline with audio

๐Ÿš€ Getting Started

Prerequisites

pip install opencv-python numpy matplotlib moviepy tqdm

Usage

Part 1: Augmented Reality

jupyter notebook augmented_reality.ipynb

Run all cells to:

  1. Load book cover image and video frames
  2. Compute homographies for each frame
  3. Overlay AR content onto the book
  4. Generate final video with audio

Part 2: Image Mosaics

jupyter notebook img_mosaics.ipynb

Or use VS Code with Jupyter extension to run the notebooks interactively.

๐Ÿ“Š Results

Part 1: Augmented Reality

Successfully created AR video with:

  • Input: Book cover image (cv_cover.jpg) + tracking video (book.mov)
  • AR Source: Custom video content (ar_source.mov)
  • Output: Seamless AR overlay video with synchronized audio
  • Processing: ~300+ frames with real-time book tracking

Part 2: Image Mosaics

The implementation successfully stitches:

  • 2-image panoramas: pano_image1.jpg + pano_image2.jpg
  • Test datasets: Multiple test image pairs (test2, test3, test5)
  • 3-image panoramas: Shanghai skyline series, test6 series

๐Ÿ”ง Technical Details

Part 1: AR Pipeline Architecture

Book Cover Image + Video Frames
    โ†“
SIFT Feature Detection & Matching
    โ†“
RANSAC Homography Estimation (per frame)
    โ†“
Book Corner Detection & Mapping
    โ†“
AR Frame Cropping & Aspect Ratio Adjustment
    โ†“
Perspective Warping (cv2.warpPerspective)
    โ†“
Polygon Mask Creation
    โ†“
AR Overlay Compositing
    โ†“
Video Encoding + Audio Synchronization
    โ†“
Final AR Video Output

Part 2: Panorama Pipeline Architecture

Input Images
    โ†“
SIFT Feature Detection
    โ†“
Feature Matching (BFMatcher + Lowe's Ratio Test)
    โ†“
RANSAC Homography Estimation
    โ†“
Canvas Creation & Reference Image Placement
    โ†“
Inverse Warping with Bilinear Interpolation
    โ†“
Final Stitched Panorama

Key Algorithms

AR-Specific: Perspective Warping & Masking

1. Compute homography H mapping book to video frame
2. Warp AR content using cv2.warpPerspective(ar_frame, H, frame_size)
3. Create polygon mask at mapped book corner positions
4. Composite: result = frame * (1-mask) + warped_ar * mask
5. Only pixels inside polygon show AR content

AR-Specific: Aspect Ratio Preservation

- Calculate aspect ratios of book and AR video
- Crop AR frames to match book aspect ratio
- Center-crop to avoid distortion
- Resize to exact book dimensions for warping

1. SIFT Feature Matching

- Detects keypoints in grayscale images
- Computes 128-dimensional descriptors
- BFMatcher with L2 norm
- Lowe's ratio test: distance(m1) < 0.75 * distance(m2)
- Keeps top 50 matches

2. DLT Homography Estimation

- Constructs 2Nร—9 matrix A from N correspondences
- Each correspondence contributes 2 equations
- Solves Ah = 0 using SVD
- Solution: right singular vector with smallest singular value
- Normalizes: H[2,2] = 1

3. RANSAC

- Iterations: 500
- Sample size: 4 points (minimum for homography)
- Inlier threshold: 5 pixels
- Refinement: Recompute H using all inliers

4. Inverse Warping (Panorama only)

- For each output pixel (x,y):
  1. Apply H_inverse to get source coordinates (x',y')
  2. Check if (x',y') is within source image bounds
  3. Use bilinear interpolation to get pixel value
  4. Assign to output canvas

๐Ÿงฎ Mathematical Background

Homography Matrix

A 3ร—3 matrix representing a projective transformation:

H = [h11  h12  h13]
    [h21  h22  h23]
    [h31  h32  h33]

Maps point (x,y) to (x',y'):

[x']   [h11  h12  h13] [x]
[y'] = [h21  h22  h23] [y]
[w']   [h31  h32  h33] [1]

x' = (h11*x + h12*y + h13) / (h31*x + h32*y + h33)
y' = (h21*x + h22*y + h23) / (h31*x + h32*y + h33)

DLT Formulation

For correspondence (x,y) โ†’ (x',y'):

[-x  -y  -1   0   0   0  x*x'  y*x'  x'] [h1]
[ 0   0   0  -x  -y  -1  x*y'  y*y'  y'] [h2] = 0
                                          [h3]
                                          [h4]
                                          [h5]
                                          [h6]
                                          [h7]
                                          [h8]
                                          [h9]

๐ŸŽ“ Key Functions

Part 1: Augmented Reality

Function Description
sift_match_images() SIFT feature detection and matching between images
compute_homography() Compute 3ร—3 homography from correspondences using DLT
RANSAC() Robust homography estimation with outlier rejection
apply_homography() Transform points using homography matrix
map_book_corners_to_frame() Detect book position in video frame
crop_and_resize_frame() Adjust AR content to book aspect ratio
overlay_ar_frame_on_book_masked() Composite AR content with polygon masking
load_video_frames() Load all frames from video file

Part 2: Image Mosaics

Function Description
findMatchesSift() SIFT detection, matching with Lowe's ratio test
DLT_HomographyEstimation() Compute homography using SVD (custom implementation)
RANSAC() Robust homography estimation with outlier rejection
bilinear_interpolation() Sub-pixel sampling for smooth warping
warp_image() Create output canvas and place reference image
inverse_warp() Backward warping with bilinear interpolation
stitch_images() Complete stitching pipeline
verify_homography() Visual verification of homography accuracy

About

AR video overlay & panorama stitching using SIFT, custom DLT homography (SVD), RANSAC, and bilinear interpolation warping

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors