A comprehensive implementation of augmented reality video overlay and image panorama stitching using SIFT feature detection, planar homography estimation with RANSAC, and advanced warping techniques.
This project implements two major computer vision applications:
Real-time AR video overlay that replaces a book cover in a video with custom AR content using:
- Planar Tracking: Detect and track a book cover across video frames
- Homography-based Warping:
cv2.warpPerspectivefor AR content projection - Mask-based Compositing: Polygon masking for seamless overlay
- Video Processing: Frame-by-frame processing with audio preservation
Create wide panoramic images from overlapping photographs using:
- Feature Detection & Matching: SIFT keypoint detection with Lowe's ratio test
- Homography Estimation: Direct Linear Transform (DLT) using SVD
- Robust Estimation: RANSAC for outlier rejection
- Image Warping: Inverse warping with bilinear interpolation
- Multi-Image Stitching: Sequential stitching for 3+ images
- โ Real-time book cover detection and tracking
- โ SIFT-based feature matching between book cover and video frames
- โ Frame-by-frame homography computation with RANSAC
- โ
Perspective warping using
cv2.warpPerspective - โ Polygon-based masking for seamless AR overlay
- โ Aspect ratio-aware cropping and resizing
- โ Video generation with original audio preservation
- โ Support for different AR source videos
- โ SIFT-based feature detection and matching
- โ Custom DLT homography estimation (no OpenCV homography functions)
- โ RANSAC implementation for robust homography computation
- โ Bilinear interpolation for sub-pixel accuracy
- โ Backward warping to avoid holes in output
- โ Homography verification with visual point mapping
- โ Support for 2-image and 3-image panoramas
- No built-in homography functions (Part 2): All homography computation done from scratch using SVD
- Efficient warping: Inverse warping ensures every output pixel has a value
- Quality interpolation: Bilinear interpolation for smooth results
- Robust matching: Lowe's ratio test (0.75) + RANSAC (5-pixel threshold)
- Production-ready AR: Full video processing pipeline with audio
pip install opencv-python numpy matplotlib moviepy tqdmjupyter notebook augmented_reality.ipynbRun all cells to:
- Load book cover image and video frames
- Compute homographies for each frame
- Overlay AR content onto the book
- Generate final video with audio
jupyter notebook img_mosaics.ipynbOr use VS Code with Jupyter extension to run the notebooks interactively.
Successfully created AR video with:
- Input: Book cover image (
cv_cover.jpg) + tracking video (book.mov) - AR Source: Custom video content (
ar_source.mov) - Output: Seamless AR overlay video with synchronized audio
- Processing: ~300+ frames with real-time book tracking
The implementation successfully stitches:
- 2-image panoramas:
pano_image1.jpg+pano_image2.jpg - Test datasets: Multiple test image pairs (test2, test3, test5)
- 3-image panoramas: Shanghai skyline series, test6 series
Book Cover Image + Video Frames
โ
SIFT Feature Detection & Matching
โ
RANSAC Homography Estimation (per frame)
โ
Book Corner Detection & Mapping
โ
AR Frame Cropping & Aspect Ratio Adjustment
โ
Perspective Warping (cv2.warpPerspective)
โ
Polygon Mask Creation
โ
AR Overlay Compositing
โ
Video Encoding + Audio Synchronization
โ
Final AR Video Output
Input Images
โ
SIFT Feature Detection
โ
Feature Matching (BFMatcher + Lowe's Ratio Test)
โ
RANSAC Homography Estimation
โ
Canvas Creation & Reference Image Placement
โ
Inverse Warping with Bilinear Interpolation
โ
Final Stitched Panorama
1. Compute homography H mapping book to video frame
2. Warp AR content using cv2.warpPerspective(ar_frame, H, frame_size)
3. Create polygon mask at mapped book corner positions
4. Composite: result = frame * (1-mask) + warped_ar * mask
5. Only pixels inside polygon show AR content- Calculate aspect ratios of book and AR video
- Crop AR frames to match book aspect ratio
- Center-crop to avoid distortion
- Resize to exact book dimensions for warping- Detects keypoints in grayscale images
- Computes 128-dimensional descriptors
- BFMatcher with L2 norm
- Lowe's ratio test: distance(m1) < 0.75 * distance(m2)
- Keeps top 50 matches- Constructs 2Nร9 matrix A from N correspondences
- Each correspondence contributes 2 equations
- Solves Ah = 0 using SVD
- Solution: right singular vector with smallest singular value
- Normalizes: H[2,2] = 1- Iterations: 500
- Sample size: 4 points (minimum for homography)
- Inlier threshold: 5 pixels
- Refinement: Recompute H using all inliers- For each output pixel (x,y):
1. Apply H_inverse to get source coordinates (x',y')
2. Check if (x',y') is within source image bounds
3. Use bilinear interpolation to get pixel value
4. Assign to output canvasA 3ร3 matrix representing a projective transformation:
H = [h11 h12 h13]
[h21 h22 h23]
[h31 h32 h33]
Maps point (x,y) to (x',y'):
[x'] [h11 h12 h13] [x]
[y'] = [h21 h22 h23] [y]
[w'] [h31 h32 h33] [1]
x' = (h11*x + h12*y + h13) / (h31*x + h32*y + h33)
y' = (h21*x + h22*y + h23) / (h31*x + h32*y + h33)
For correspondence (x,y) โ (x',y'):
[-x -y -1 0 0 0 x*x' y*x' x'] [h1]
[ 0 0 0 -x -y -1 x*y' y*y' y'] [h2] = 0
[h3]
[h4]
[h5]
[h6]
[h7]
[h8]
[h9]
| Function | Description |
|---|---|
sift_match_images() |
SIFT feature detection and matching between images |
compute_homography() |
Compute 3ร3 homography from correspondences using DLT |
RANSAC() |
Robust homography estimation with outlier rejection |
apply_homography() |
Transform points using homography matrix |
map_book_corners_to_frame() |
Detect book position in video frame |
crop_and_resize_frame() |
Adjust AR content to book aspect ratio |
overlay_ar_frame_on_book_masked() |
Composite AR content with polygon masking |
load_video_frames() |
Load all frames from video file |
| Function | Description |
|---|---|
findMatchesSift() |
SIFT detection, matching with Lowe's ratio test |
DLT_HomographyEstimation() |
Compute homography using SVD (custom implementation) |
RANSAC() |
Robust homography estimation with outlier rejection |
bilinear_interpolation() |
Sub-pixel sampling for smooth warping |
warp_image() |
Create output canvas and place reference image |
inverse_warp() |
Backward warping with bilinear interpolation |
stitch_images() |
Complete stitching pipeline |
verify_homography() |
Visual verification of homography accuracy |