This Technical Architecture Report is subject to change with version updates; note that this report may not be the most current version of the project
Air Canvas - Complete Technical Architecture
Core Libraries
- MediaPipe Hands (@mediapipe/hands v0.4.x)
Google's machine learning solution for real-time hand tracking. Uses TensorFlow.js under the hood.
- What it does: Detects 21 3D hand landmarks per hand in real-time
- Model complexity: Set to level 1 (balance between accuracy/speed)
- Confidence thresholds: 0.7 for both detection and tracking
- Performance: Processes video frames, skipping some to maintain ~30 FPS
- CDN delivery: Loads ML models from jsDelivr CDN
- MediaPipe Camera Utils (@mediapipe/camera_utils)
Wrapper for webcam access that integrates with MediaPipe's processing pipeline
- Frame management: Captures video frames and sends to MediaPipe
- Automatic synchronization: Handles frame timing and model inference coordination
- HTML5 Canvas API (Native)
Raster graphics rendering system
- 2D context: Used for all drawing operations
- Immediate mode: Each stroke is drawn directly to canvas
- Line rendering: Uses round caps/joins for smooth strokes
Project Architecture
Component Hierarchy
App (Orchestrator) ├── VideoFeed (Displays webcam) ├── DrawingCanvas (Renders strokes) ├── HandTracker (Visual pinch indicator) └── ControlPanel (UI controls + dock tab)
Data Flow Pipeline
- Video Capture → 2. Hand Detection → 3. Gesture Recognition → 4. Interaction/Drawing
getUserMedia → HTMLVideoElement → MediaPipe Camera → MediaPipe Hands Model → 21 landmarks → Gesture Processing → Pinch Detection + Smoothing → App.tsx coordination → Canvas Drawing OR UI Interaction
Detailed Component Breakdown
VideoFeed.tsx - Video Layer
Purpose: Display mirrored webcam feed transform: 'scaleX(-1)' // Mirror for natural hand-eye coordination objectFit: 'cover' // Fill entire viewport pointerEvents: 'none' // Don't block interactions Callback pattern: Exposes ref via onVideoReady(HTMLVideoElement) to parent
useWebcam Hook - Camera Access
Location: src/hooks/useWebcam.ts
Responsibilities:
- Request camera permissions via navigator.mediaDevices.getUserMedia()
- Configure ideal resolution (1280x720) for performance/quality balance
- Auto-retry mechanism with trigger state
- Cleanup: Stop media stream tracks on unmount
Configuration: video: { width: { ideal: 1280 }, // Balances detail with processing speed height: { ideal: 720 }, facingMode: 'user' // Front-facing camera }
Error handling: Catches permission denials, missing hardware, etc.
useMediaPipe Hook - ML Pipeline
Location: src/hooks/useMediaPipe.ts
Initialization sequence:
- Create Hands instance with CDN-based model loading
- Configure detection parameters: maxNumHands: 1 // Only track dominant hand (performance) modelComplexity: 1 // Medium accuracy (0=lite, 1=full, 2=heavy) minDetectionConfidence: 0.7 // Initial detection threshold minTrackingConfidence: 0.7 // Frame-to-frame tracking threshold
- Register onResults callback (fires per processed frame)
- Create Camera wrapper linking video element to MediaPipe
- Start processing loop
Frame processing: onFrame: async () => { await hands.send({ image: videoElement }) } MediaPipe processes frames asynchronously. Not every frame gets processed—it automatically drops frames to maintain real-time performance.
Cleanup: Stops camera loop and closes MediaPipe instance to free GPU/CPU resources.
Hand Landmark System
MediaPipe returns 21 landmarks per hand in normalized coordinates [0, 1]: Landmarks used in this project:
- [4]: Thumb tip
- [8]: Index finger tip
Each landmark contains: { x: number, // Normalized [0-1] horizontal position y: number, // Normalized [0-1] vertical position z: number // Depth (not used in this project) }
useGestureDetection Hook - Pinch Recognition
Location: src/hooks/useGestureDetection.ts
Algorithm Components:
- Coordinate Transform (Lines 28-39)
Converts MediaPipe's normalized coordinates to screen pixels: // Horizontal flip because video is mirrored x: canvasWidth - (landmark.x * canvasWidth) // Direct vertical mapping y: landmark.y * canvasHeight
- Pinch Detection (Lines 41-42)
distance = √[(thumbX - indexX)² + (thumbY - indexY)²] isPinching = distance < PINCH_THRESHOLD (75 pixels) Why 75px?: Balance between accidental triggers and intentional pinches at arm's length from camera.
- Pinch Point Calculation (Lines 42-47)
midpoint = ((thumbX + indexX) / 2, (thumbY + indexY) / 2) adjustedPoint.y = midpoint.y - 15 // Offset upward for better visual alignment The -15px offset accounts for finger thickness—puts cursor at fingertip contact point rather than geometric center.
- Adaptive Exponential Smoothing (Lines 49-63)
Problem: Raw landmark positions jitter due to ML uncertainty and actual hand tremor.
Solution: Exponential Moving Average (EMA) with velocity-dependent smoothing:
velocity = √[(currentX - lastX)² + (currentY - lastY)²]
if (velocity > 15 pixels/frame) { smoothingFactor = 0.2 // Fast movement: prioritize responsiveness } else { smoothingFactor = 0.6 // Slow movement: prioritize stability }
smoothedX = lastX × smoothingFactor + currentX × (1 - smoothingFactor) smoothedY = lastY × smoothingFactor + currentY × (1 - smoothingFactor)
Effect:
- Fast movements (gestures, UI interactions): Low lag, some jitter acceptable
- Slow movements (precise drawing): Maximum stability, slight lag acceptable
This is a first-order IIR filter that maintains temporal continuity while adapting to context.
useCanvas Hook - Drawing System
Location: src/hooks/useCanvas.ts
Data Structures: interface Stroke { points: Point[] // Ordered sequence of coordinates color: string // Hex color code size: number // Brush radius in pixels }
State Management: strokes: Stroke[] // React state: completed strokes currentStrokeRef: Stroke // Ref: in-progress stroke (not rendered until complete) ctxRef: CanvasRenderingContext2D // Ref: canvas rendering context
Drawing Lifecycle:
-
startStroke(point, color, size) - Line 49 currentStrokeRef.current = { points: [point], color, size } Initializes new stroke. Doesn't draw yet—waits for movement.
-
addPointToStroke(point) - Lines 57-80 This is the performance-critical hot path.
Algorithm: lastPoint = stroke.points[lastIndex] distance = √[(newX - lastX)² + (newY - lastY)²]
if (distance > 5 pixels) { // Fast movement detected interpolatedPoints = bresenhamLine(lastPoint, point) stroke.points.push(...interpolatedPoints)
// Draw each interpolated segment
for (each new segment) {
drawStrokePart(ctx, stroke, segmentIndex)
}
} else { // Normal movement stroke.points.push(point) drawStrokePart(ctx, stroke, lastSegmentIndex) }
Why 5 pixel threshold?
- MediaPipe frame skipping can create gaps in hand tracking
- Fast hand movements create large position jumps between frames
- Without interpolation, strokes become dotted lines
- Bresenham fills gaps with mathematically perfect line segments
- endStroke() - Line 82 setStrokes(prev => [...prev, currentStrokeRef.current]) currentStrokeRef.current = null Commits stroke to permanent history. Enables undo functionality (not implemented but easy to add).
Bresenham's Line Algorithm - geometry.ts:32-63
Purpose: Generate all integer pixel coordinates along a line from start to end.
Why use it?
- HTML Canvas lineTo() is anti-aliased and variable-width
- For fast movements, we need explicit point sequences
- Classic computer graphics algorithm (1962) - extremely efficient
Algorithm explanation: dx = |x1 - x0| // Horizontal distance dy = |y1 - y0| // Vertical distance sx = x0 < x1 ? 1 : -1 // X direction sy = y0 < y1 ? 1 : -1 // Y direction err = dx - dy // Decision variable
while (not at end) { plot(x0, y0)
// Decide whether to step horizontally, vertically, or both
e2 = 2 × err
if (e2 > -dy) { // Step in X
err -= dy
x0 += sx
}
if (e2 < dx) { // Step in Y
err += dx
y0 += sy
}
}
Key insight: Uses only integer arithmetic (no floating-point), making it extremely fast. The error accumulation determines when to step in each direction.
Example: Line from (0,0) to (5,2) Points generated: (0,0), (1,0), (2,1), (3,1), (4,2), (5,2) Creates visually smooth diagonal line.
Rendering Functions:
drawStrokePart(ctx, stroke, startIndex) - Line 105 Draws single line segment between two adjacent points: ctx.strokeStyle = stroke.color ctx.lineWidth = stroke.size ctx.lineCap = 'round' // Smooth endpoints ctx.lineJoin = 'round' // Smooth corners ctx.beginPath() ctx.moveTo(points[i].x, points[i].y) ctx.lineTo(points[i+1].x, points[i+1].y) ctx.stroke()
redrawCanvas(ctx, strokes) - Line 119 Full canvas repaint (used on resize, clear): ctx.clearRect(0, 0, width, height) for (each stroke) { ctx.beginPath() ctx.moveTo(firstPoint) for (each remaining point) { ctx.lineTo(point) } ctx.stroke() }
Performance optimization: Uses single beginPath()/stroke() per stroke instead of per-segment. Much faster for redraws.
App.tsx - Central Orchestrator
State Management (Lines 12-18): videoElement: HTMLVideoElement // Reference to video DOM element canvasElement: HTMLCanvasElement // Reference to canvas DOM element currentColor: string // Active brush color brushSize: number // Active brush size (2-50px) hoveredControl: string | null // Currently hovered UI element ID showPanel: boolean // Control panel visibility panelScale: number // Panel zoom level (0.5-2x)
Refs for Interaction State (Lines 31-37): wasPinchingRef: boolean // Previous frame pinch state lastInteractionTimeRef: number // Debounce timestamp lastHoverCheckRef: number // Hover check throttle isWidgetInteractionRef: boolean // Blocks drawing during UI interaction resizeStartPosRef: Point // Resize gesture start position resizeStartScaleRef: number // Scale at resize start isResizingRef: boolean // Active resize operation
Why refs instead of state?
- Avoid re-renders on every frame (60 Hz gesture updates)
- State changes trigger re-renders—refs don't
- Only UI-visible changes use setState
Interaction Effect (Lines 39-124)
Throttling (Lines 66-70): if (Date.now() - lastHoverCheckRef < 50ms) return Limits hover detection to 20 Hz instead of 60 Hz. CPU savings for DOM queries.
Hover Detection Algorithm (Lines 72-117): elements = querySelectorAll('[data-control-id]')
for (each element) { rect = element.getBoundingClientRect()
isHovering = (
pinchX >= rect.left &&
pinchX <= rect.right &&
pinchY >= rect.top &&
pinchY <= rect.bottom
)
if (isHovering) {
switch (controlId) {
case 'dock-tab':
if (!showPanel && !isPinching) {
setShowPanel(true) // Open panel on hover (no pinch required)
}
break
case 'resize-handle':
if (isPinching) {
startResize() // Begin resize gesture
}
break
case 'brush-slider':
if (isPinching) {
// Map X position to brush size
relativeX = pinchX - rect.left
percentage = relativeX / rect.width
newSize = 2 + percentage × 48
}
break
case 'color-*':
if (isPinching && debounced) {
setCurrentColor(color)
}
break
case 'clear-button':
if (isPinching && debounced) {
clearCanvas()
}
break
}
}
}
Debouncing (Line 105): if (now - lastInteractionTime > 300ms) { // Allow interaction } Prevents rapid repeated clicks. 300ms = human reaction time buffer.
Resize Gesture Handler (Lines 51-58): if (isResizing && isPinching) { deltaY = currentPinchY - startPinchY scaleDelta = deltaY / 200 // 200px vertical movement = 1x scale change newScale = clamp(0.5, startScale + scaleDelta, 2.0) setPanelScale(newScale) } Vertical drag gesture to resize panel. Linear mapping.
Drawing Effect (Lines 126-159)
Guard Conditions (Line 127): if ( !pinchPoint || // No hand detected (hoveredControl && hoveredControl !== 'dock-tab') || // Over UI (block drawing) isResizingRef.current // During resize gesture ) { endCurrentStroke() return }
State Machine: NOT_PINCHING → PINCHING: startStroke() PINCHING → PINCHING: addPointToStroke() PINCHING → NOT_PINCHING: endStroke()
Implementation: if (isPinching) { if (!wasPinching && !isWidgetInteraction) { startStroke(pinchPoint, color, size) wasPinching = true setShowPanel(false) // Hide panel when drawing starts } else if (wasPinching) { addPointToStroke(pinchPoint) // Continue stroke } } else { if (wasPinching) { endStroke() wasPinching = false } isWidgetInteraction = false // Reset UI interaction flag }
Automatic panel hiding (Line 139): When drawing starts, panel slides out. Prevents accidental UI hits during drawing.
ControlPanel.tsx - UI Layer
Layout Structure: Control Panel (fixed top-right) ├── Color Grid (5×2) ├── Brush Size Slider (2-50px) ├── Clear Button └── Resize Handle (bottom-left corner)
Dock Tab (fixed right-center, outside panel)
Visibility Animation (Lines 55-59):
opacity: showPanel ? 1 : 0
transform: showPanel
? translateX(0) scale(${scale})
: translateX(100%) scale(${scale})
transformOrigin: 'top right'
transition: 'opacity 0.3s ease, transform 0.3s ease'
Effect:
- Hidden: Offscreen right, transparent
- Visible: Onscreen, solid
- Scale applies from top-right corner (natural resize feel)
Resize Handle Hover Effect (Lines 128-140): bottom: isHovered ? '-30px' : '0' left: isHovered ? '-30px' : '0' width: isHovered ? '60px' : '30px' height: isHovered ? '60px' : '30px' transition: 'all 0.2s ease'
Expansion direction: Bottom-left diagonal
- Negative offsets extend handle outside parent bounds
- Parent doesn't clip (overflow: visible by default)
- Makes small target easier to hit
Visual indicator (Lines 143-175): Three horizontal bars (resize grip): ───── (longest) ──── (medium) ─── (shortest) Scales proportionally with hover state.
HandTracker.tsx - Visual Feedback
Simple positioning:
position: 'absolute'
left: ${pinchPoint.x}px
top: ${pinchPoint.y}px
transform: 'translate(-50%, -50%)' // Center circle on point
Color coding: backgroundColor: isPinching ? '#00ff00' : '#ffff00' // Green Yellow
Purpose: Immediate visual feedback for gesture recognition state.
z-index: 2000: Always on top of everything (including panel, which is 1000).
Performance Optimizations
- Throttled Hover Checks (50ms interval)
Reduces DOM queries from 60 Hz → 20 Hz. Saves ~66% CPU in hover loop.
- Ref-based State
Interaction flags use refs to avoid re-renders. Only user-visible changes trigger renders.
- Incremental Canvas Drawing
New stroke segments drawn immediately. Avoids full canvas redraw on every point.
- Bresenham Interpolation
Only applied when distance > 5px. Avoids overhead for normal movement.
- Single Hand Tracking
maxNumHands: 1 cuts MediaPipe processing time nearly in half vs. 2 hands.
- Model Complexity: 1
Balanced model. Level 0 is faster but less accurate; Level 2 is overkill for this use case.
- Adaptive Smoothing
Fast movements get less smoothing (lower latency). Slow movements get more (better precision).
- Minimal Re-renders
Panel scale/visibility changes don't trigger hand tracking re-initialization.
Notable Design Decisions
Why mirror the video?
transform: 'scaleX(-1)' Creates "mirror mode"—natural for users. When you move right, reflection moves right. Without this, movements feel inverted (like using a mouse with reversed X-axis).
Why not flip coordinates instead?
Coordinates ARE flipped in useGestureDetection: x: canvasWidth - thumbTip.x * canvasWidth Video mirroring is purely visual. Coordinate flipping handles actual positioning logic.
Why offset pinch point upward 15px?
y: midpoint.y - 15 When pinching, fingertips are vertically stacked (index usually above thumb). Geometric midpoint is behind the contact point. -15px brings cursor to visual contact location.
Why 75px pinch threshold?
Calibrated for typical webcam viewing distances (18-24 inches). Too small = hard to trigger. Too large = accidental triggers.
Why separate stroke storage?
currentStrokeRef: Stroke // Active, being drawn strokes: Stroke[] // History, immutable Enables:
- Undo/redo (not implemented but trivial to add)
- Stroke export/serialization
- Collaborative features (each stroke is discrete unit)
- Efficient re-rendering (only redraw completed strokes on resize)
Data Flow Summary
Complete cycle from hand to pixel:
- Camera → 1280×720 video frame
- MediaPipe Camera → Send frame to ML model
- MediaPipe Hands → Detect hand, return 21 landmarks (normalized)
- useGestureDetection → Extract thumb[4] & index[8], compute distance & midpoint
- Adaptive smoothing → Apply velocity-aware EMA filter
- App.tsx effect #1 → Check hover state, handle UI interactions OR set isWidgetInteraction flag
- App.tsx effect #2 → If not over UI and pinching: call useCanvas methods
- useCanvas → Add point to stroke, apply Bresenham if needed, draw to canvas
- HandTracker → Update visual indicator position & color
- Next frame → Repeat
Frequency:
- Webcam capture: 30 FPS
- MediaPipe processing: ~15-25 FPS (adaptive frame skipping)
- Canvas drawing: Event-driven (only when new points arrive)
- Hover checks: 20 Hz (throttled)
- React renders: As needed (state-change triggered)
Technology Stack Summary
| Layer | Technology | Purpose |
|---|---|---|
| ML/CV | MediaPipe Hands | Hand landmark detection |
| Camera | getUserMedia API | Webcam access |
| Rendering | HTML5 Canvas 2D | Raster drawing |
| UI Framework | React 18 | Component structure |
| Language | TypeScript | Type safety |
| State | React Hooks | State management |
| Build | Vite | Fast HMR dev server |
| Algorithms | Bresenham, EMA | Interpolation, smoothing |
Key Algorithms Summary
- Euclidean Distance (Pinch detection)
d = √[(x₂-x₁)² + (y₂-y₁)²]
- Exponential Moving Average (Smoothing)
smoothed = α × previous + (1-α) × current where α = adaptive (0.2 or 0.6 based on velocity)
- Bresenham's Line (Interpolation)
Integer-only rasterization algorithm for perfect straight lines.
- Bounding Box Collision (Hover detection)
collision = ( px ≥ box.left && px ≤ box.right && py ≥ box.top && py ≤ box.bottom )
This architecture demonstrates production-ready real-time computer vision integration with smooth, responsive UX despite ML latency and inherent hand tracking jitter.