GitHub - warfarm/Air-Canvas

This Technical Architecture Report is subject to change with version updates; note that this report may not be the most current version of the project

Air Canvas - Complete Technical Architecture

Core Libraries

MediaPipe Hands (@mediapipe/hands v0.4.x)

Google's machine learning solution for real-time hand tracking. Uses TensorFlow.js under the hood.

What it does: Detects 21 3D hand landmarks per hand in real-time
Model complexity: Set to level 1 (balance between accuracy/speed)
Confidence thresholds: 0.7 for both detection and tracking
Performance: Processes video frames, skipping some to maintain ~30 FPS
CDN delivery: Loads ML models from jsDelivr CDN

MediaPipe Camera Utils (@mediapipe/camera_utils)

Wrapper for webcam access that integrates with MediaPipe's processing pipeline

Frame management: Captures video frames and sends to MediaPipe
Automatic synchronization: Handles frame timing and model inference coordination

HTML5 Canvas API (Native)

Raster graphics rendering system

2D context: Used for all drawing operations
Immediate mode: Each stroke is drawn directly to canvas
Line rendering: Uses round caps/joins for smooth strokes

Project Architecture

Component Hierarchy

App (Orchestrator) ├── VideoFeed (Displays webcam) ├── DrawingCanvas (Renders strokes) ├── HandTracker (Visual pinch indicator) └── ControlPanel (UI controls + dock tab)

Data Flow Pipeline

Video Capture → 2. Hand Detection → 3. Gesture Recognition → 4. Interaction/Drawing

getUserMedia → HTMLVideoElement → MediaPipe Camera → MediaPipe Hands Model → 21 landmarks → Gesture Processing → Pinch Detection + Smoothing → App.tsx coordination → Canvas Drawing OR UI Interaction

Detailed Component Breakdown

VideoFeed.tsx - Video Layer

Purpose: Display mirrored webcam feed transform: 'scaleX(-1)' // Mirror for natural hand-eye coordination objectFit: 'cover' // Fill entire viewport pointerEvents: 'none' // Don't block interactions Callback pattern: Exposes ref via onVideoReady(HTMLVideoElement) to parent

useWebcam Hook - Camera Access

Location: src/hooks/useWebcam.ts

Responsibilities:

Request camera permissions via navigator.mediaDevices.getUserMedia()
Configure ideal resolution (1280x720) for performance/quality balance
Auto-retry mechanism with trigger state
Cleanup: Stop media stream tracks on unmount

Configuration: video: { width: { ideal: 1280 }, // Balances detail with processing speed height: { ideal: 720 }, facingMode: 'user' // Front-facing camera }

Error handling: Catches permission denials, missing hardware, etc.

useMediaPipe Hook - ML Pipeline

Location: src/hooks/useMediaPipe.ts

Initialization sequence:

Create Hands instance with CDN-based model loading
Configure detection parameters: maxNumHands: 1 // Only track dominant hand (performance) modelComplexity: 1 // Medium accuracy (0=lite, 1=full, 2=heavy) minDetectionConfidence: 0.7 // Initial detection threshold minTrackingConfidence: 0.7 // Frame-to-frame tracking threshold
Register onResults callback (fires per processed frame)
Create Camera wrapper linking video element to MediaPipe
Start processing loop

Frame processing: onFrame: async () => { await hands.send({ image: videoElement }) } MediaPipe processes frames asynchronously. Not every frame gets processed—it automatically drops frames to maintain real-time performance.

Cleanup: Stops camera loop and closes MediaPipe instance to free GPU/CPU resources.

Hand Landmark System

MediaPipe returns 21 landmarks per hand in normalized coordinates [0, 1]: Landmarks used in this project:

[4]: Thumb tip
[8]: Index finger tip

Each landmark contains: { x: number, // Normalized [0-1] horizontal position y: number, // Normalized [0-1] vertical position z: number // Depth (not used in this project) }

useGestureDetection Hook - Pinch Recognition

Location: src/hooks/useGestureDetection.ts

Algorithm Components:

Coordinate Transform (Lines 28-39)

Converts MediaPipe's normalized coordinates to screen pixels: // Horizontal flip because video is mirrored x: canvasWidth - (landmark.x * canvasWidth) // Direct vertical mapping y: landmark.y * canvasHeight

Pinch Detection (Lines 41-42)

distance = √[(thumbX - indexX)² + (thumbY - indexY)²] isPinching = distance < PINCH_THRESHOLD (75 pixels) Why 75px?: Balance between accidental triggers and intentional pinches at arm's length from camera.

Pinch Point Calculation (Lines 42-47)

midpoint = ((thumbX + indexX) / 2, (thumbY + indexY) / 2) adjustedPoint.y = midpoint.y - 15 // Offset upward for better visual alignment The -15px offset accounts for finger thickness—puts cursor at fingertip contact point rather than geometric center.

Adaptive Exponential Smoothing (Lines 49-63)

Problem: Raw landmark positions jitter due to ML uncertainty and actual hand tremor.

Solution: Exponential Moving Average (EMA) with velocity-dependent smoothing:

velocity = √[(currentX - lastX)² + (currentY - lastY)²]

if (velocity > 15 pixels/frame) { smoothingFactor = 0.2 // Fast movement: prioritize responsiveness } else { smoothingFactor = 0.6 // Slow movement: prioritize stability }

smoothedX = lastX × smoothingFactor + currentX × (1 - smoothingFactor) smoothedY = lastY × smoothingFactor + currentY × (1 - smoothingFactor)

Effect:

Fast movements (gestures, UI interactions): Low lag, some jitter acceptable
Slow movements (precise drawing): Maximum stability, slight lag acceptable

This is a first-order IIR filter that maintains temporal continuity while adapting to context.

useCanvas Hook - Drawing System

Location: src/hooks/useCanvas.ts

Data Structures: interface Stroke { points: Point[] // Ordered sequence of coordinates color: string // Hex color code size: number // Brush radius in pixels }

State Management: strokes: Stroke[] // React state: completed strokes currentStrokeRef: Stroke // Ref: in-progress stroke (not rendered until complete) ctxRef: CanvasRenderingContext2D // Ref: canvas rendering context

Drawing Lifecycle:

startStroke(point, color, size) - Line 49 currentStrokeRef.current = { points: [point], color, size } Initializes new stroke. Doesn't draw yet—waits for movement.
addPointToStroke(point) - Lines 57-80 This is the performance-critical hot path.

Algorithm: lastPoint = stroke.points[lastIndex] distance = √[(newX - lastX)² + (newY - lastY)²]

if (distance > 5 pixels) { // Fast movement detected interpolatedPoints = bresenhamLine(lastPoint, point) stroke.points.push(...interpolatedPoints)

// Draw each interpolated segment
for (each new segment) {
  drawStrokePart(ctx, stroke, segmentIndex)
}

} else { // Normal movement stroke.points.push(point) drawStrokePart(ctx, stroke, lastSegmentIndex) }

Why 5 pixel threshold?

MediaPipe frame skipping can create gaps in hand tracking
Fast hand movements create large position jumps between frames
Without interpolation, strokes become dotted lines
Bresenham fills gaps with mathematically perfect line segments

endStroke() - Line 82 setStrokes(prev => [...prev, currentStrokeRef.current]) currentStrokeRef.current = null Commits stroke to permanent history. Enables undo functionality (not implemented but easy to add).

Bresenham's Line Algorithm - geometry.ts:32-63

Purpose: Generate all integer pixel coordinates along a line from start to end.

Why use it?

HTML Canvas lineTo() is anti-aliased and variable-width
For fast movements, we need explicit point sequences
Classic computer graphics algorithm (1962) - extremely efficient

Algorithm explanation: dx = |x1 - x0| // Horizontal distance dy = |y1 - y0| // Vertical distance sx = x0 < x1 ? 1 : -1 // X direction sy = y0 < y1 ? 1 : -1 // Y direction err = dx - dy // Decision variable

while (not at end) { plot(x0, y0)

// Decide whether to step horizontally, vertically, or both
e2 = 2 × err
if (e2 > -dy) {  // Step in X
  err -= dy
  x0 += sx
}
if (e2 < dx) {   // Step in Y
  err += dx
  y0 += sy
}

}

Key insight: Uses only integer arithmetic (no floating-point), making it extremely fast. The error accumulation determines when to step in each direction.

Example: Line from (0,0) to (5,2) Points generated: (0,0), (1,0), (2,1), (3,1), (4,2), (5,2) Creates visually smooth diagonal line.

Rendering Functions:

drawStrokePart(ctx, stroke, startIndex) - Line 105 Draws single line segment between two adjacent points: ctx.strokeStyle = stroke.color ctx.lineWidth = stroke.size ctx.lineCap = 'round' // Smooth endpoints ctx.lineJoin = 'round' // Smooth corners ctx.beginPath() ctx.moveTo(points[i].x, points[i].y) ctx.lineTo(points[i+1].x, points[i+1].y) ctx.stroke()

redrawCanvas(ctx, strokes) - Line 119 Full canvas repaint (used on resize, clear): ctx.clearRect(0, 0, width, height) for (each stroke) { ctx.beginPath() ctx.moveTo(firstPoint) for (each remaining point) { ctx.lineTo(point) } ctx.stroke() }

Performance optimization: Uses single beginPath()/stroke() per stroke instead of per-segment. Much faster for redraws.

App.tsx - Central Orchestrator

State Management (Lines 12-18): videoElement: HTMLVideoElement // Reference to video DOM element canvasElement: HTMLCanvasElement // Reference to canvas DOM element currentColor: string // Active brush color brushSize: number // Active brush size (2-50px) hoveredControl: string | null // Currently hovered UI element ID showPanel: boolean // Control panel visibility panelScale: number // Panel zoom level (0.5-2x)

Refs for Interaction State (Lines 31-37): wasPinchingRef: boolean // Previous frame pinch state lastInteractionTimeRef: number // Debounce timestamp lastHoverCheckRef: number // Hover check throttle isWidgetInteractionRef: boolean // Blocks drawing during UI interaction resizeStartPosRef: Point // Resize gesture start position resizeStartScaleRef: number // Scale at resize start isResizingRef: boolean // Active resize operation

Why refs instead of state?

Avoid re-renders on every frame (60 Hz gesture updates)
State changes trigger re-renders—refs don't
Only UI-visible changes use setState

Interaction Effect (Lines 39-124)

Throttling (Lines 66-70): if (Date.now() - lastHoverCheckRef < 50ms) return Limits hover detection to 20 Hz instead of 60 Hz. CPU savings for DOM queries.

Hover Detection Algorithm (Lines 72-117): elements = querySelectorAll('[data-control-id]')

for (each element) { rect = element.getBoundingClientRect()

isHovering = (
  pinchX >= rect.left &&
  pinchX <= rect.right &&
  pinchY >= rect.top &&
  pinchY <= rect.bottom
)

if (isHovering) {
  switch (controlId) {
    case 'dock-tab':
      if (!showPanel && !isPinching) {
        setShowPanel(true)  // Open panel on hover (no pinch required)
      }
      break

    case 'resize-handle':
      if (isPinching) {
        startResize()  // Begin resize gesture
      }
      break

    case 'brush-slider':
      if (isPinching) {
        // Map X position to brush size
        relativeX = pinchX - rect.left
        percentage = relativeX / rect.width
        newSize = 2 + percentage × 48
      }
      break

    case 'color-*':
      if (isPinching && debounced) {
        setCurrentColor(color)
      }
      break

    case 'clear-button':
      if (isPinching && debounced) {
        clearCanvas()
      }
      break
  }
}

}

Debouncing (Line 105): if (now - lastInteractionTime > 300ms) { // Allow interaction } Prevents rapid repeated clicks. 300ms = human reaction time buffer.

Resize Gesture Handler (Lines 51-58): if (isResizing && isPinching) { deltaY = currentPinchY - startPinchY scaleDelta = deltaY / 200 // 200px vertical movement = 1x scale change newScale = clamp(0.5, startScale + scaleDelta, 2.0) setPanelScale(newScale) } Vertical drag gesture to resize panel. Linear mapping.

Drawing Effect (Lines 126-159)

Guard Conditions (Line 127): if ( !pinchPoint || // No hand detected (hoveredControl && hoveredControl !== 'dock-tab') || // Over UI (block drawing) isResizingRef.current // During resize gesture ) { endCurrentStroke() return }

State Machine: NOT_PINCHING → PINCHING: startStroke() PINCHING → PINCHING: addPointToStroke() PINCHING → NOT_PINCHING: endStroke()

Implementation: if (isPinching) { if (!wasPinching && !isWidgetInteraction) { startStroke(pinchPoint, color, size) wasPinching = true setShowPanel(false) // Hide panel when drawing starts } else if (wasPinching) { addPointToStroke(pinchPoint) // Continue stroke } } else { if (wasPinching) { endStroke() wasPinching = false } isWidgetInteraction = false // Reset UI interaction flag }

Automatic panel hiding (Line 139): When drawing starts, panel slides out. Prevents accidental UI hits during drawing.

ControlPanel.tsx - UI Layer

Layout Structure: Control Panel (fixed top-right) ├── Color Grid (5×2) ├── Brush Size Slider (2-50px) ├── Clear Button └── Resize Handle (bottom-left corner)

Dock Tab (fixed right-center, outside panel)

Visibility Animation (Lines 55-59): opacity: showPanel ? 1 : 0 transform: showPanel ? translateX(0) scale(${scale}) : translateX(100%) scale(${scale}) transformOrigin: 'top right' transition: 'opacity 0.3s ease, transform 0.3s ease'

Effect:

Hidden: Offscreen right, transparent
Visible: Onscreen, solid
Scale applies from top-right corner (natural resize feel)

Resize Handle Hover Effect (Lines 128-140): bottom: isHovered ? '-30px' : '0' left: isHovered ? '-30px' : '0' width: isHovered ? '60px' : '30px' height: isHovered ? '60px' : '30px' transition: 'all 0.2s ease'

Expansion direction: Bottom-left diagonal

Negative offsets extend handle outside parent bounds
Parent doesn't clip (overflow: visible by default)
Makes small target easier to hit

Visual indicator (Lines 143-175): Three horizontal bars (resize grip): ───── (longest) ──── (medium) ─── (shortest) Scales proportionally with hover state.

HandTracker.tsx - Visual Feedback

Simple positioning: position: 'absolute' left: ${pinchPoint.x}px top: ${pinchPoint.y}px transform: 'translate(-50%, -50%)' // Center circle on point

Color coding: backgroundColor: isPinching ? '#00ff00' : '#ffff00' // Green Yellow

Purpose: Immediate visual feedback for gesture recognition state.

z-index: 2000: Always on top of everything (including panel, which is 1000).

Performance Optimizations

Throttled Hover Checks (50ms interval)

Reduces DOM queries from 60 Hz → 20 Hz. Saves ~66% CPU in hover loop.

Ref-based State

Interaction flags use refs to avoid re-renders. Only user-visible changes trigger renders.

Incremental Canvas Drawing

New stroke segments drawn immediately. Avoids full canvas redraw on every point.

Bresenham Interpolation

Only applied when distance > 5px. Avoids overhead for normal movement.

Single Hand Tracking

maxNumHands: 1 cuts MediaPipe processing time nearly in half vs. 2 hands.

Model Complexity: 1

Balanced model. Level 0 is faster but less accurate; Level 2 is overkill for this use case.

Adaptive Smoothing

Fast movements get less smoothing (lower latency). Slow movements get more (better precision).

Minimal Re-renders

Panel scale/visibility changes don't trigger hand tracking re-initialization.

Notable Design Decisions

Why mirror the video?

transform: 'scaleX(-1)' Creates "mirror mode"—natural for users. When you move right, reflection moves right. Without this, movements feel inverted (like using a mouse with reversed X-axis).

Why not flip coordinates instead?

Coordinates ARE flipped in useGestureDetection: x: canvasWidth - thumbTip.x * canvasWidth Video mirroring is purely visual. Coordinate flipping handles actual positioning logic.

Why offset pinch point upward 15px?

y: midpoint.y - 15 When pinching, fingertips are vertically stacked (index usually above thumb). Geometric midpoint is behind the contact point. -15px brings cursor to visual contact location.

Why 75px pinch threshold?

Calibrated for typical webcam viewing distances (18-24 inches). Too small = hard to trigger. Too large = accidental triggers.

Why separate stroke storage?

currentStrokeRef: Stroke // Active, being drawn strokes: Stroke[] // History, immutable Enables:

Undo/redo (not implemented but trivial to add)
Stroke export/serialization
Collaborative features (each stroke is discrete unit)
Efficient re-rendering (only redraw completed strokes on resize)

Data Flow Summary

Complete cycle from hand to pixel:

Camera → 1280×720 video frame
MediaPipe Camera → Send frame to ML model
MediaPipe Hands → Detect hand, return 21 landmarks (normalized)
useGestureDetection → Extract thumb[4] & index[8], compute distance & midpoint
Adaptive smoothing → Apply velocity-aware EMA filter
App.tsx effect #1 → Check hover state, handle UI interactions OR set isWidgetInteraction flag
App.tsx effect #2 → If not over UI and pinching: call useCanvas methods
useCanvas → Add point to stroke, apply Bresenham if needed, draw to canvas
HandTracker → Update visual indicator position & color
Next frame → Repeat

Frequency:

Webcam capture: 30 FPS
MediaPipe processing: ~15-25 FPS (adaptive frame skipping)
Canvas drawing: Event-driven (only when new points arrive)
Hover checks: 20 Hz (throttled)
React renders: As needed (state-change triggered)

Technology Stack Summary

Layer	Technology	Purpose
ML/CV	MediaPipe Hands	Hand landmark detection
Camera	getUserMedia API	Webcam access
Rendering	HTML5 Canvas 2D	Raster drawing
UI Framework	React 18	Component structure
Language	TypeScript	Type safety
State	React Hooks	State management
Build	Vite	Fast HMR dev server
Algorithms	Bresenham, EMA	Interpolation, smoothing

Key Algorithms Summary

Euclidean Distance (Pinch detection)

d = √[(x₂-x₁)² + (y₂-y₁)²]

Exponential Moving Average (Smoothing)

smoothed = α × previous + (1-α) × current where α = adaptive (0.2 or 0.6 based on velocity)

Bresenham's Line (Interpolation)

Integer-only rasterization algorithm for perfect straight lines.

Bounding Box Collision (Hover detection)

collision = ( px ≥ box.left && px ≤ box.right && py ≥ box.top && py ≤ box.bottom )

This architecture demonstrates production-ready real-time computer vision integration with smooth, responsive UX despite ML latency and inherent hand tracking jitter.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.claude		.claude
ar-paint-app/src/hooks		ar-paint-app/src/hooks
public		public
src		src
.gitignore		.gitignore
.mcp.json		.mcp.json
CLAUDE.md		CLAUDE.md
README.md		README.md
index.html		index.html
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json
vite.config.ts		vite.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages