diff --git a/docs.json b/docs.json
index 03d7f83..76d9ade 100644
--- a/docs.json
+++ b/docs.json
@@ -253,7 +253,8 @@
"icon": "globe",
"pages": [
"examples/web/vl-webgpu-demo",
- "examples/web/audio-webgpu-demo"
+ "examples/web/audio-webgpu-demo",
+ "examples/web/hand-voice-racer"
]
},
{
diff --git a/examples/index.mdx b/examples/index.mdx
index dae2e4c..00f07cd 100644
--- a/examples/index.mdx
+++ b/examples/index.mdx
@@ -2,58 +2,92 @@
title: "Examples Library"
---
+## Laptop
+
-
- Turn invoices into structured JSON using a lightweight Vision Language Model. 100% local, no API costs.
-
-
- Build a real-time audio transcription CLI using LFM2-Audio-1.5B with llama.cpp. 100% local processing without internet connection.
-
+
+ Turn invoices into structured JSON using a lightweight Vision Language Model. 100% local, no API costs.
+
-
- Fine-tune LFM2-VL to identify car makers from images. Learn structured generation with Outlines and parameter-efficient fine-tuning with LoRA.
-
+
+ Build a real-time audio transcription CLI using LFM2-Audio-1.5B with llama.cpp. 100% local processing without internet connection.
+
-
- Efficient bidirectional translation system powered by LFM2 1.2B fine-tuned for Korean-English translation with automatic language detection.
-
+
+ Efficient bidirectional translation system powered by LFM2 1.2B fine-tuned for Korean-English translation with automatic language detection.
+
-
- Python CLI leveraging LFM2.5-1.2B-Thinking for multi-step reasoning and tool calling to find and book flights.
-
+
+ Python CLI leveraging LFM2.5-1.2B-Thinking for multi-step reasoning and tool calling to find and book flights.
+
-
- Voice-controlled car cockpit interface combining LFM2.5-Audio-1.5B in TTS/STT modes with LFM2-1.2B-Tool. Real-time local processing.
-
+
+ Voice-controlled car cockpit interface combining LFM2.5-Audio-1.5B in TTS/STT modes with LFM2-1.2B-Tool. Real-time local processing.
+
-
- 100% local meeting summarization tool using LFM2-2.6B-Transcript and llama.cpp. No cloud services or API keys required.
-
+
+ 100% local meeting summarization tool using LFM2-2.6B-Transcript and llama.cpp. No cloud services or API keys required.
+
-
- Train language models for web automation using reinforcement learning. Demonstrates GRPO fine-tuning with BrowserGym environments.
-
+
+ Train language models for web automation using reinforcement learning. Demonstrates GRPO fine-tuning with BrowserGym environments.
+
-
- Android app for single-turn generation of creative product slogans using local AI models. Built with traditional Android Views.
-
+
-
- Share web pages from any browser to this Android app for instant AI-powered summarization. Complete privacy with local processing.
-
+## Android
-
- Generate recipes with guaranteed JSON structure using constrained generation. Demonstrates automatic model downloading with LeapSDK.
-
+
-
- Analyze images and answer visual questions on Android using Vision Language Models. Built with Jetpack Compose and Coil.
-
+
+ Android app for single-turn generation of creative product slogans using local AI models. Built with traditional Android Views.
+
+
+
+ Share web pages from any browser to this Android app for instant AI-powered summarization. Complete privacy with local processing.
+
+
+
+ Generate recipes with guaranteed JSON structure using constrained generation. Demonstrates automatic model downloading with LeapSDK.
+
+
+
+ Analyze images and answer visual questions on Android using Vision Language Models. Built with Jetpack Compose and Coil.
+
+
+
+ Build intelligent AI agents on Android with the Koog framework. Demonstrates tool invocation, context management, and MCP integration.
+
+
+
+
+## Web
+
+
+
+
+ A browser driving game controlled with your hands and voice. MediaPipe tracks hand gestures for steering while LFM2.5-Audio-1.5B transcribes voice commands. Fully local, no server round-trips.
+
+
+
+ Run LFM2.5-Audio-1.5B entirely in the browser with WebGPU. Supports ASR, TTS, and interleaved audio-text conversations. No data sent to external servers.
+
+
+
+ Real-time video captioning with LFM2.5-VL-1.6B running fully client-side via WebGPU and ONNX Runtime Web. No cloud inference required.
+
+
+
+
+## Model Customization
+
+
+
+
+ Fine-tune LFM2-VL to identify car makers from images. Learn structured generation with Outlines and parameter-efficient fine-tuning with LoRA.
+
-
- Build intelligent AI agents on Android with the Koog framework. Demonstrates tool invocation, context management, and MCP integration.
-
## Cannot find the example you need?
diff --git a/examples/web/hand-voice-racer.mdx b/examples/web/hand-voice-racer.mdx
new file mode 100644
index 0000000..0f770f9
--- /dev/null
+++ b/examples/web/hand-voice-racer.mdx
@@ -0,0 +1,85 @@
+---
+title: "Hand & Voice Racer"
+---
+
+
+ Browse the complete example on GitHub
+
+
+
+
+**A browser driving game you control with your hands and voice, powered by models running fully local.**
+
+Steer by holding both hands up like a steering wheel. Speak commands to accelerate, brake, toggle headlights, and play music. No cloud calls, no server round-trips. Everything runs in your browser tab.
+
+## How it works
+
+Two models run in parallel, entirely client-side:
+
+- **[MediaPipe Hand Landmarker](https://ai.google.dev/edge/mediapipe/solutions/vision/hand_landmarker)** tracks your hand positions via webcam at ~30 fps. The angle between your two wrists drives the steering.
+- **[LFM2.5-Audio-1.5B](https://docs.liquid.ai/lfm/models/lfm25-audio-1.5b)** runs in a Web Worker with ONNX Runtime Web. It listens for speech via the [Silero VAD](https://github.com/snakers4/silero-vad) and transcribes each utterance on-device. Matched keywords control game state.
+
+The audio model loads from Hugging Face and is cached in IndexedDB after the first run, so subsequent starts are instant.
+
+## Voice commands
+
+| Say | Effect |
+|-----|--------|
+| `speed` / `fast` / `go` | Accelerate to 120 km/h |
+| `slow` / `stop` / `brake` | Decelerate to 0 km/h |
+| `lights on` | Enable headlights |
+| `lights off` | Disable headlights |
+| `music` / `play` | Start the techno beat |
+| `stop music` / `silence` | Stop the beat |
+
+## Prerequisites
+
+
+ **Browser Requirements**
+
+ - Chrome 113+ or Edge 113+ (WebGPU required for fast audio inference; falls back to WASM)
+ - Webcam and microphone access
+ - Node.js 18+
+
+
+## Run locally
+
+```bash
+npm install
+npm run dev
+```
+
+Then open [http://localhost:3001](http://localhost:3001).
+
+On first load the audio model (~900 MB at Q4 quantization) downloads from Hugging Face and is cached in your browser. Hand detection assets load from CDN and MediaPipe's model storage.
+
+## Architecture
+
+```
+Browser tab
+├── main thread
+│ ├── MediaPipe HandLandmarker (webcam → hand angles → steering)
+│ ├── Canvas 2D renderer (road, scenery, dashboard, HUD)
+│ └── Web Audio API (procedural techno synthesizer)
+└── audio-worker.js (Web Worker)
+ ├── Silero VAD (mic → speech segments)
+ └── LFM2.5-Audio-1.5B ONNX (speech segment → transcript → keyword)
+```
+
+The game loop runs on `requestAnimationFrame`. Hand detection is throttled to ~30 fps so it does not block rendering. Voice processing happens off the main thread and delivers results via `postMessage`.
+
+## Need help?
+
+
+
+
+ Connect with the community and ask questions about this example.
+
+
+