Skip to content

warfarm/Air-Canvas

Repository files navigation

This Technical Architecture Report is subject to change with version updates; note that this report may not be the most current version of the project

Air Canvas - Complete Technical Architecture

Core Libraries

  1. MediaPipe Hands (@mediapipe/hands v0.4.x)

Google's machine learning solution for real-time hand tracking. Uses TensorFlow.js under the hood.

  • What it does: Detects 21 3D hand landmarks per hand in real-time
  • Model complexity: Set to level 1 (balance between accuracy/speed)
  • Confidence thresholds: 0.7 for both detection and tracking
  • Performance: Processes video frames, skipping some to maintain ~30 FPS
  • CDN delivery: Loads ML models from jsDelivr CDN
  1. MediaPipe Camera Utils (@mediapipe/camera_utils)

Wrapper for webcam access that integrates with MediaPipe's processing pipeline

  • Frame management: Captures video frames and sends to MediaPipe
  • Automatic synchronization: Handles frame timing and model inference coordination
  1. HTML5 Canvas API (Native)

Raster graphics rendering system

  • 2D context: Used for all drawing operations
  • Immediate mode: Each stroke is drawn directly to canvas
  • Line rendering: Uses round caps/joins for smooth strokes

Project Architecture

Component Hierarchy

App (Orchestrator) ├── VideoFeed (Displays webcam) ├── DrawingCanvas (Renders strokes) ├── HandTracker (Visual pinch indicator) └── ControlPanel (UI controls + dock tab)

Data Flow Pipeline

  1. Video Capture → 2. Hand Detection → 3. Gesture Recognition → 4. Interaction/Drawing

getUserMedia → HTMLVideoElement → MediaPipe Camera → MediaPipe Hands Model → 21 landmarks → Gesture Processing → Pinch Detection + Smoothing → App.tsx coordination → Canvas Drawing OR UI Interaction

Detailed Component Breakdown

VideoFeed.tsx - Video Layer

Purpose: Display mirrored webcam feed transform: 'scaleX(-1)' // Mirror for natural hand-eye coordination objectFit: 'cover' // Fill entire viewport pointerEvents: 'none' // Don't block interactions Callback pattern: Exposes ref via onVideoReady(HTMLVideoElement) to parent


useWebcam Hook - Camera Access

Location: src/hooks/useWebcam.ts

Responsibilities:

  1. Request camera permissions via navigator.mediaDevices.getUserMedia()
  2. Configure ideal resolution (1280x720) for performance/quality balance
  3. Auto-retry mechanism with trigger state
  4. Cleanup: Stop media stream tracks on unmount

Configuration: video: { width: { ideal: 1280 }, // Balances detail with processing speed height: { ideal: 720 }, facingMode: 'user' // Front-facing camera }

Error handling: Catches permission denials, missing hardware, etc.


useMediaPipe Hook - ML Pipeline

Location: src/hooks/useMediaPipe.ts

Initialization sequence:

  1. Create Hands instance with CDN-based model loading
  2. Configure detection parameters: maxNumHands: 1 // Only track dominant hand (performance) modelComplexity: 1 // Medium accuracy (0=lite, 1=full, 2=heavy) minDetectionConfidence: 0.7 // Initial detection threshold minTrackingConfidence: 0.7 // Frame-to-frame tracking threshold
  3. Register onResults callback (fires per processed frame)
  4. Create Camera wrapper linking video element to MediaPipe
  5. Start processing loop

Frame processing: onFrame: async () => { await hands.send({ image: videoElement }) } MediaPipe processes frames asynchronously. Not every frame gets processed—it automatically drops frames to maintain real-time performance.

Cleanup: Stops camera loop and closes MediaPipe instance to free GPU/CPU resources.


Hand Landmark System

MediaPipe returns 21 landmarks per hand in normalized coordinates [0, 1]: Landmarks used in this project:

  • [4]: Thumb tip
  • [8]: Index finger tip

Each landmark contains: { x: number, // Normalized [0-1] horizontal position y: number, // Normalized [0-1] vertical position z: number // Depth (not used in this project) }


useGestureDetection Hook - Pinch Recognition

Location: src/hooks/useGestureDetection.ts

Algorithm Components:

  1. Coordinate Transform (Lines 28-39)

Converts MediaPipe's normalized coordinates to screen pixels: // Horizontal flip because video is mirrored x: canvasWidth - (landmark.x * canvasWidth) // Direct vertical mapping y: landmark.y * canvasHeight

  1. Pinch Detection (Lines 41-42)

distance = √[(thumbX - indexX)² + (thumbY - indexY)²] isPinching = distance < PINCH_THRESHOLD (75 pixels) Why 75px?: Balance between accidental triggers and intentional pinches at arm's length from camera.

  1. Pinch Point Calculation (Lines 42-47)

midpoint = ((thumbX + indexX) / 2, (thumbY + indexY) / 2) adjustedPoint.y = midpoint.y - 15 // Offset upward for better visual alignment The -15px offset accounts for finger thickness—puts cursor at fingertip contact point rather than geometric center.

  1. Adaptive Exponential Smoothing (Lines 49-63)

Problem: Raw landmark positions jitter due to ML uncertainty and actual hand tremor.

Solution: Exponential Moving Average (EMA) with velocity-dependent smoothing:

velocity = √[(currentX - lastX)² + (currentY - lastY)²]

if (velocity > 15 pixels/frame) { smoothingFactor = 0.2 // Fast movement: prioritize responsiveness } else { smoothingFactor = 0.6 // Slow movement: prioritize stability }

smoothedX = lastX × smoothingFactor + currentX × (1 - smoothingFactor) smoothedY = lastY × smoothingFactor + currentY × (1 - smoothingFactor)

Effect:

  • Fast movements (gestures, UI interactions): Low lag, some jitter acceptable
  • Slow movements (precise drawing): Maximum stability, slight lag acceptable

This is a first-order IIR filter that maintains temporal continuity while adapting to context.


useCanvas Hook - Drawing System

Location: src/hooks/useCanvas.ts

Data Structures: interface Stroke { points: Point[] // Ordered sequence of coordinates color: string // Hex color code size: number // Brush radius in pixels }

State Management: strokes: Stroke[] // React state: completed strokes currentStrokeRef: Stroke // Ref: in-progress stroke (not rendered until complete) ctxRef: CanvasRenderingContext2D // Ref: canvas rendering context

Drawing Lifecycle:

  1. startStroke(point, color, size) - Line 49 currentStrokeRef.current = { points: [point], color, size } Initializes new stroke. Doesn't draw yet—waits for movement.

  2. addPointToStroke(point) - Lines 57-80 This is the performance-critical hot path.

Algorithm: lastPoint = stroke.points[lastIndex] distance = √[(newX - lastX)² + (newY - lastY)²]

if (distance > 5 pixels) { // Fast movement detected interpolatedPoints = bresenhamLine(lastPoint, point) stroke.points.push(...interpolatedPoints)

// Draw each interpolated segment
for (each new segment) {
  drawStrokePart(ctx, stroke, segmentIndex)
}

} else { // Normal movement stroke.points.push(point) drawStrokePart(ctx, stroke, lastSegmentIndex) }

Why 5 pixel threshold?

  • MediaPipe frame skipping can create gaps in hand tracking
  • Fast hand movements create large position jumps between frames
  • Without interpolation, strokes become dotted lines
  • Bresenham fills gaps with mathematically perfect line segments
  1. endStroke() - Line 82 setStrokes(prev => [...prev, currentStrokeRef.current]) currentStrokeRef.current = null Commits stroke to permanent history. Enables undo functionality (not implemented but easy to add).

Bresenham's Line Algorithm - geometry.ts:32-63

Purpose: Generate all integer pixel coordinates along a line from start to end.

Why use it?

  • HTML Canvas lineTo() is anti-aliased and variable-width
  • For fast movements, we need explicit point sequences
  • Classic computer graphics algorithm (1962) - extremely efficient

Algorithm explanation: dx = |x1 - x0| // Horizontal distance dy = |y1 - y0| // Vertical distance sx = x0 < x1 ? 1 : -1 // X direction sy = y0 < y1 ? 1 : -1 // Y direction err = dx - dy // Decision variable

while (not at end) { plot(x0, y0)

// Decide whether to step horizontally, vertically, or both
e2 = 2 × err
if (e2 > -dy) {  // Step in X
  err -= dy
  x0 += sx
}
if (e2 < dx) {   // Step in Y
  err += dx
  y0 += sy
}

}

Key insight: Uses only integer arithmetic (no floating-point), making it extremely fast. The error accumulation determines when to step in each direction.

Example: Line from (0,0) to (5,2) Points generated: (0,0), (1,0), (2,1), (3,1), (4,2), (5,2) Creates visually smooth diagonal line.

Rendering Functions:

drawStrokePart(ctx, stroke, startIndex) - Line 105 Draws single line segment between two adjacent points: ctx.strokeStyle = stroke.color ctx.lineWidth = stroke.size ctx.lineCap = 'round' // Smooth endpoints ctx.lineJoin = 'round' // Smooth corners ctx.beginPath() ctx.moveTo(points[i].x, points[i].y) ctx.lineTo(points[i+1].x, points[i+1].y) ctx.stroke()

redrawCanvas(ctx, strokes) - Line 119 Full canvas repaint (used on resize, clear): ctx.clearRect(0, 0, width, height) for (each stroke) { ctx.beginPath() ctx.moveTo(firstPoint) for (each remaining point) { ctx.lineTo(point) } ctx.stroke() }

Performance optimization: Uses single beginPath()/stroke() per stroke instead of per-segment. Much faster for redraws.


App.tsx - Central Orchestrator

State Management (Lines 12-18): videoElement: HTMLVideoElement // Reference to video DOM element canvasElement: HTMLCanvasElement // Reference to canvas DOM element currentColor: string // Active brush color brushSize: number // Active brush size (2-50px) hoveredControl: string | null // Currently hovered UI element ID showPanel: boolean // Control panel visibility panelScale: number // Panel zoom level (0.5-2x)

Refs for Interaction State (Lines 31-37): wasPinchingRef: boolean // Previous frame pinch state lastInteractionTimeRef: number // Debounce timestamp lastHoverCheckRef: number // Hover check throttle isWidgetInteractionRef: boolean // Blocks drawing during UI interaction resizeStartPosRef: Point // Resize gesture start position resizeStartScaleRef: number // Scale at resize start isResizingRef: boolean // Active resize operation

Why refs instead of state?

  • Avoid re-renders on every frame (60 Hz gesture updates)
  • State changes trigger re-renders—refs don't
  • Only UI-visible changes use setState

Interaction Effect (Lines 39-124)

Throttling (Lines 66-70): if (Date.now() - lastHoverCheckRef < 50ms) return Limits hover detection to 20 Hz instead of 60 Hz. CPU savings for DOM queries.

Hover Detection Algorithm (Lines 72-117): elements = querySelectorAll('[data-control-id]')

for (each element) { rect = element.getBoundingClientRect()

isHovering = (
  pinchX >= rect.left &&
  pinchX <= rect.right &&
  pinchY >= rect.top &&
  pinchY <= rect.bottom
)

if (isHovering) {
  switch (controlId) {
    case 'dock-tab':
      if (!showPanel && !isPinching) {
        setShowPanel(true)  // Open panel on hover (no pinch required)
      }
      break

    case 'resize-handle':
      if (isPinching) {
        startResize()  // Begin resize gesture
      }
      break

    case 'brush-slider':
      if (isPinching) {
        // Map X position to brush size
        relativeX = pinchX - rect.left
        percentage = relativeX / rect.width
        newSize = 2 + percentage × 48
      }
      break

    case 'color-*':
      if (isPinching && debounced) {
        setCurrentColor(color)
      }
      break

    case 'clear-button':
      if (isPinching && debounced) {
        clearCanvas()
      }
      break
  }
}

}

Debouncing (Line 105): if (now - lastInteractionTime > 300ms) { // Allow interaction } Prevents rapid repeated clicks. 300ms = human reaction time buffer.

Resize Gesture Handler (Lines 51-58): if (isResizing && isPinching) { deltaY = currentPinchY - startPinchY scaleDelta = deltaY / 200 // 200px vertical movement = 1x scale change newScale = clamp(0.5, startScale + scaleDelta, 2.0) setPanelScale(newScale) } Vertical drag gesture to resize panel. Linear mapping.

Drawing Effect (Lines 126-159)

Guard Conditions (Line 127): if ( !pinchPoint || // No hand detected (hoveredControl && hoveredControl !== 'dock-tab') || // Over UI (block drawing) isResizingRef.current // During resize gesture ) { endCurrentStroke() return }

State Machine: NOT_PINCHING → PINCHING: startStroke() PINCHING → PINCHING: addPointToStroke() PINCHING → NOT_PINCHING: endStroke()

Implementation: if (isPinching) { if (!wasPinching && !isWidgetInteraction) { startStroke(pinchPoint, color, size) wasPinching = true setShowPanel(false) // Hide panel when drawing starts } else if (wasPinching) { addPointToStroke(pinchPoint) // Continue stroke } } else { if (wasPinching) { endStroke() wasPinching = false } isWidgetInteraction = false // Reset UI interaction flag }

Automatic panel hiding (Line 139): When drawing starts, panel slides out. Prevents accidental UI hits during drawing.


ControlPanel.tsx - UI Layer

Layout Structure: Control Panel (fixed top-right) ├── Color Grid (5×2) ├── Brush Size Slider (2-50px) ├── Clear Button └── Resize Handle (bottom-left corner)

Dock Tab (fixed right-center, outside panel)

Visibility Animation (Lines 55-59): opacity: showPanel ? 1 : 0 transform: showPanel ? translateX(0) scale(${scale}) : translateX(100%) scale(${scale}) transformOrigin: 'top right' transition: 'opacity 0.3s ease, transform 0.3s ease'

Effect:

  • Hidden: Offscreen right, transparent
  • Visible: Onscreen, solid
  • Scale applies from top-right corner (natural resize feel)

Resize Handle Hover Effect (Lines 128-140): bottom: isHovered ? '-30px' : '0' left: isHovered ? '-30px' : '0' width: isHovered ? '60px' : '30px' height: isHovered ? '60px' : '30px' transition: 'all 0.2s ease'

Expansion direction: Bottom-left diagonal

  • Negative offsets extend handle outside parent bounds
  • Parent doesn't clip (overflow: visible by default)
  • Makes small target easier to hit

Visual indicator (Lines 143-175): Three horizontal bars (resize grip): ───── (longest) ──── (medium) ─── (shortest) Scales proportionally with hover state.


HandTracker.tsx - Visual Feedback

Simple positioning: position: 'absolute' left: ${pinchPoint.x}px top: ${pinchPoint.y}px transform: 'translate(-50%, -50%)' // Center circle on point

Color coding: backgroundColor: isPinching ? '#00ff00' : '#ffff00' // Green Yellow

Purpose: Immediate visual feedback for gesture recognition state.

z-index: 2000: Always on top of everything (including panel, which is 1000).


Performance Optimizations

  1. Throttled Hover Checks (50ms interval)

Reduces DOM queries from 60 Hz → 20 Hz. Saves ~66% CPU in hover loop.

  1. Ref-based State

Interaction flags use refs to avoid re-renders. Only user-visible changes trigger renders.

  1. Incremental Canvas Drawing

New stroke segments drawn immediately. Avoids full canvas redraw on every point.

  1. Bresenham Interpolation

Only applied when distance > 5px. Avoids overhead for normal movement.

  1. Single Hand Tracking

maxNumHands: 1 cuts MediaPipe processing time nearly in half vs. 2 hands.

  1. Model Complexity: 1

Balanced model. Level 0 is faster but less accurate; Level 2 is overkill for this use case.

  1. Adaptive Smoothing

Fast movements get less smoothing (lower latency). Slow movements get more (better precision).

  1. Minimal Re-renders

Panel scale/visibility changes don't trigger hand tracking re-initialization.


Notable Design Decisions

Why mirror the video?

transform: 'scaleX(-1)' Creates "mirror mode"—natural for users. When you move right, reflection moves right. Without this, movements feel inverted (like using a mouse with reversed X-axis).

Why not flip coordinates instead?

Coordinates ARE flipped in useGestureDetection: x: canvasWidth - thumbTip.x * canvasWidth Video mirroring is purely visual. Coordinate flipping handles actual positioning logic.

Why offset pinch point upward 15px?

y: midpoint.y - 15 When pinching, fingertips are vertically stacked (index usually above thumb). Geometric midpoint is behind the contact point. -15px brings cursor to visual contact location.

Why 75px pinch threshold?

Calibrated for typical webcam viewing distances (18-24 inches). Too small = hard to trigger. Too large = accidental triggers.

Why separate stroke storage?

currentStrokeRef: Stroke // Active, being drawn strokes: Stroke[] // History, immutable Enables:

  • Undo/redo (not implemented but trivial to add)
  • Stroke export/serialization
  • Collaborative features (each stroke is discrete unit)
  • Efficient re-rendering (only redraw completed strokes on resize)

Data Flow Summary

Complete cycle from hand to pixel:

  1. Camera → 1280×720 video frame
  2. MediaPipe Camera → Send frame to ML model
  3. MediaPipe Hands → Detect hand, return 21 landmarks (normalized)
  4. useGestureDetection → Extract thumb[4] & index[8], compute distance & midpoint
  5. Adaptive smoothing → Apply velocity-aware EMA filter
  6. App.tsx effect #1 → Check hover state, handle UI interactions OR set isWidgetInteraction flag
  7. App.tsx effect #2 → If not over UI and pinching: call useCanvas methods
  8. useCanvas → Add point to stroke, apply Bresenham if needed, draw to canvas
  9. HandTracker → Update visual indicator position & color
  10. Next frame → Repeat

Frequency:

  • Webcam capture: 30 FPS
  • MediaPipe processing: ~15-25 FPS (adaptive frame skipping)
  • Canvas drawing: Event-driven (only when new points arrive)
  • Hover checks: 20 Hz (throttled)
  • React renders: As needed (state-change triggered)

Technology Stack Summary

Layer Technology Purpose
ML/CV MediaPipe Hands Hand landmark detection
Camera getUserMedia API Webcam access
Rendering HTML5 Canvas 2D Raster drawing
UI Framework React 18 Component structure
Language TypeScript Type safety
State React Hooks State management
Build Vite Fast HMR dev server
Algorithms Bresenham, EMA Interpolation, smoothing

Key Algorithms Summary

  1. Euclidean Distance (Pinch detection)

d = √[(x₂-x₁)² + (y₂-y₁)²]

  1. Exponential Moving Average (Smoothing)

smoothed = α × previous + (1-α) × current where α = adaptive (0.2 or 0.6 based on velocity)

  1. Bresenham's Line (Interpolation)

Integer-only rasterization algorithm for perfect straight lines.

  1. Bounding Box Collision (Hover detection)

collision = ( px ≥ box.left && px ≤ box.right && py ≥ box.top && py ≤ box.bottom )


This architecture demonstrates production-ready real-time computer vision integration with smooth, responsive UX despite ML latency and inherent hand tracking jitter.

About

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors