From bd2d1f0401e58a618d2687ae37219c37b9c4cc0b Mon Sep 17 00:00:00 2001 From: Paulescu Date: Tue, 3 Mar 2026 23:52:55 +0100 Subject: [PATCH 1/2] Add Hand & Voice Racer web example Adds a new page under Web Examples for the hand-voice-racer cookbook example, including a YouTube demo embed, full content from the README, and registers it in the docs.json navigation. Co-Authored-By: Claude Sonnet 4.6 --- docs.json | 3 +- examples/web/hand-voice-racer.mdx | 85 +++++++++++++++++++++++++++++++ 2 files changed, 87 insertions(+), 1 deletion(-) create mode 100644 examples/web/hand-voice-racer.mdx diff --git a/docs.json b/docs.json index 03d7f83..76d9ade 100644 --- a/docs.json +++ b/docs.json @@ -253,7 +253,8 @@ "icon": "globe", "pages": [ "examples/web/vl-webgpu-demo", - "examples/web/audio-webgpu-demo" + "examples/web/audio-webgpu-demo", + "examples/web/hand-voice-racer" ] }, { diff --git a/examples/web/hand-voice-racer.mdx b/examples/web/hand-voice-racer.mdx new file mode 100644 index 0000000..0f770f9 --- /dev/null +++ b/examples/web/hand-voice-racer.mdx @@ -0,0 +1,85 @@ +--- +title: "Hand & Voice Racer" +--- + + + Browse the complete example on GitHub + + + + +**A browser driving game you control with your hands and voice, powered by models running fully local.** + +Steer by holding both hands up like a steering wheel. Speak commands to accelerate, brake, toggle headlights, and play music. No cloud calls, no server round-trips. Everything runs in your browser tab. + +## How it works + +Two models run in parallel, entirely client-side: + +- **[MediaPipe Hand Landmarker](https://ai.google.dev/edge/mediapipe/solutions/vision/hand_landmarker)** tracks your hand positions via webcam at ~30 fps. The angle between your two wrists drives the steering. +- **[LFM2.5-Audio-1.5B](https://docs.liquid.ai/lfm/models/lfm25-audio-1.5b)** runs in a Web Worker with ONNX Runtime Web. It listens for speech via the [Silero VAD](https://github.com/snakers4/silero-vad) and transcribes each utterance on-device. Matched keywords control game state. + +The audio model loads from Hugging Face and is cached in IndexedDB after the first run, so subsequent starts are instant. + +## Voice commands + +| Say | Effect | +|-----|--------| +| `speed` / `fast` / `go` | Accelerate to 120 km/h | +| `slow` / `stop` / `brake` | Decelerate to 0 km/h | +| `lights on` | Enable headlights | +| `lights off` | Disable headlights | +| `music` / `play` | Start the techno beat | +| `stop music` / `silence` | Stop the beat | + +## Prerequisites + + + **Browser Requirements** + + - Chrome 113+ or Edge 113+ (WebGPU required for fast audio inference; falls back to WASM) + - Webcam and microphone access + - Node.js 18+ + + +## Run locally + +```bash +npm install +npm run dev +``` + +Then open [http://localhost:3001](http://localhost:3001). + +On first load the audio model (~900 MB at Q4 quantization) downloads from Hugging Face and is cached in your browser. Hand detection assets load from CDN and MediaPipe's model storage. + +## Architecture + +``` +Browser tab +├── main thread +│ ├── MediaPipe HandLandmarker (webcam → hand angles → steering) +│ ├── Canvas 2D renderer (road, scenery, dashboard, HUD) +│ └── Web Audio API (procedural techno synthesizer) +└── audio-worker.js (Web Worker) + ├── Silero VAD (mic → speech segments) + └── LFM2.5-Audio-1.5B ONNX (speech segment → transcript → keyword) +``` + +The game loop runs on `requestAnimationFrame`. Hand detection is throttled to ~30 fps so it does not block rendering. Voice processing happens off the main thread and delivers results via `postMessage`. + +## Need help? + + + + + Connect with the community and ask questions about this example. + + + From bf1375e99232b6425e402fb2c56cb0d88addf044 Mon Sep 17 00:00:00 2001 From: Paulescu Date: Wed, 4 Mar 2026 00:02:40 +0100 Subject: [PATCH 2/2] Reorganize examples index into sections with 3 missing web cards Add Laptop, Android, Web, and Model Customization sections to the examples library page, and add the previously missing Hand & Voice Racer, Audio WebGPU Demo, and Real-Time Video Captioning cards. Co-Authored-By: Claude Sonnet 4.6 --- examples/index.mdx | 112 +++++++++++++++++++++++++++++---------------- 1 file changed, 73 insertions(+), 39 deletions(-) diff --git a/examples/index.mdx b/examples/index.mdx index dae2e4c..00f07cd 100644 --- a/examples/index.mdx +++ b/examples/index.mdx @@ -2,58 +2,92 @@ title: "Examples Library" --- +## Laptop + - - Turn invoices into structured JSON using a lightweight Vision Language Model. 100% local, no API costs. - - - Build a real-time audio transcription CLI using LFM2-Audio-1.5B with llama.cpp. 100% local processing without internet connection. - + + Turn invoices into structured JSON using a lightweight Vision Language Model. 100% local, no API costs. + - - Fine-tune LFM2-VL to identify car makers from images. Learn structured generation with Outlines and parameter-efficient fine-tuning with LoRA. - + + Build a real-time audio transcription CLI using LFM2-Audio-1.5B with llama.cpp. 100% local processing without internet connection. + - - Efficient bidirectional translation system powered by LFM2 1.2B fine-tuned for Korean-English translation with automatic language detection. - + + Efficient bidirectional translation system powered by LFM2 1.2B fine-tuned for Korean-English translation with automatic language detection. + - - Python CLI leveraging LFM2.5-1.2B-Thinking for multi-step reasoning and tool calling to find and book flights. - + + Python CLI leveraging LFM2.5-1.2B-Thinking for multi-step reasoning and tool calling to find and book flights. + - - Voice-controlled car cockpit interface combining LFM2.5-Audio-1.5B in TTS/STT modes with LFM2-1.2B-Tool. Real-time local processing. - + + Voice-controlled car cockpit interface combining LFM2.5-Audio-1.5B in TTS/STT modes with LFM2-1.2B-Tool. Real-time local processing. + - - 100% local meeting summarization tool using LFM2-2.6B-Transcript and llama.cpp. No cloud services or API keys required. - + + 100% local meeting summarization tool using LFM2-2.6B-Transcript and llama.cpp. No cloud services or API keys required. + - - Train language models for web automation using reinforcement learning. Demonstrates GRPO fine-tuning with BrowserGym environments. - + + Train language models for web automation using reinforcement learning. Demonstrates GRPO fine-tuning with BrowserGym environments. + - - Android app for single-turn generation of creative product slogans using local AI models. Built with traditional Android Views. - + - - Share web pages from any browser to this Android app for instant AI-powered summarization. Complete privacy with local processing. - +## Android - - Generate recipes with guaranteed JSON structure using constrained generation. Demonstrates automatic model downloading with LeapSDK. - + - - Analyze images and answer visual questions on Android using Vision Language Models. Built with Jetpack Compose and Coil. - + + Android app for single-turn generation of creative product slogans using local AI models. Built with traditional Android Views. + + + + Share web pages from any browser to this Android app for instant AI-powered summarization. Complete privacy with local processing. + + + + Generate recipes with guaranteed JSON structure using constrained generation. Demonstrates automatic model downloading with LeapSDK. + + + + Analyze images and answer visual questions on Android using Vision Language Models. Built with Jetpack Compose and Coil. + + + + Build intelligent AI agents on Android with the Koog framework. Demonstrates tool invocation, context management, and MCP integration. + + + + +## Web + + + + + A browser driving game controlled with your hands and voice. MediaPipe tracks hand gestures for steering while LFM2.5-Audio-1.5B transcribes voice commands. Fully local, no server round-trips. + + + + Run LFM2.5-Audio-1.5B entirely in the browser with WebGPU. Supports ASR, TTS, and interleaved audio-text conversations. No data sent to external servers. + + + + Real-time video captioning with LFM2.5-VL-1.6B running fully client-side via WebGPU and ONNX Runtime Web. No cloud inference required. + + + + +## Model Customization + + + + + Fine-tune LFM2-VL to identify car makers from images. Learn structured generation with Outlines and parameter-efficient fine-tuning with LoRA. + - - Build intelligent AI agents on Android with the Koog framework. Demonstrates tool invocation, context management, and MCP integration. - ## Cannot find the example you need?