Skip to content

Commit 5babdfb

Browse files
Paulescuclaude
andauthored
Add Hand & Voice Racer web example (#75)
## Summary - Adds `examples/web/hand-voice-racer.mdx` with content from the [cookbook README](https://github.com/Liquid4All/cookbook/tree/main/examples/hand-voice-racer) - Embeds the YouTube demo video at the top of the page - Registers the new page under Web Examples in `docs.json` ## Test plan - [ ] Verify the page renders correctly at `/examples/web/hand-voice-racer` - [ ] Confirm the YouTube embed plays - [ ] Confirm the GitHub source card and Discord CTA render correctly - [ ] Check the page appears in the Web Examples nav group 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
1 parent 3a3e67d commit 5babdfb

3 files changed

Lines changed: 160 additions & 40 deletions

File tree

docs.json

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -253,7 +253,8 @@
253253
"icon": "globe",
254254
"pages": [
255255
"examples/web/vl-webgpu-demo",
256-
"examples/web/audio-webgpu-demo"
256+
"examples/web/audio-webgpu-demo",
257+
"examples/web/hand-voice-racer"
257258
]
258259
},
259260
{

examples/index.mdx

Lines changed: 73 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -2,58 +2,92 @@
22
title: "Examples Library"
33
---
44

5+
## Laptop
6+
57
<CardGroup cols={2}>
6-
<Card title="Invoice Extractor Tool" icon="file-invoice" href="/examples/laptop-examples/invoice-extractor-tool-with-liquid-nanos">
7-
Turn invoices into structured JSON using a lightweight Vision Language Model. 100% local, no API costs.
8-
</Card>
98

10-
<Card title="Audio Transcription in Real-Time" icon="microphone" href="/examples/laptop-examples/audio-to-text-in-real-time">
11-
Build a real-time audio transcription CLI using LFM2-Audio-1.5B with llama.cpp. 100% local processing without internet connection.
12-
</Card>
9+
<Card title="Invoice Extractor Tool" icon="file-invoice" href="/examples/laptop-examples/invoice-extractor-tool-with-liquid-nanos">
10+
Turn invoices into structured JSON using a lightweight Vision Language Model. 100% local, no API costs.
11+
</Card>
1312

14-
<Card title="Car Maker Identification" icon="car" href="/examples/customize-models/car-maker-identification">
15-
Fine-tune LFM2-VL to identify car makers from images. Learn structured generation with Outlines and parameter-efficient fine-tuning with LoRA.
16-
</Card>
13+
<Card title="Audio Transcription in Real-Time" icon="microphone" href="/examples/laptop-examples/audio-to-text-in-real-time">
14+
Build a real-time audio transcription CLI using LFM2-Audio-1.5B with llama.cpp. 100% local processing without internet connection.
15+
</Card>
1716

18-
<Card title="English-Korean Translation" icon="globe" href="/examples/laptop-examples/lfm2-english-to-korean">
19-
Efficient bidirectional translation system powered by LFM2 1.2B fine-tuned for Korean-English translation with automatic language detection.
20-
</Card>
17+
<Card title="English-Korean Translation" icon="globe" href="/examples/laptop-examples/lfm2-english-to-korean">
18+
Efficient bidirectional translation system powered by LFM2 1.2B fine-tuned for Korean-English translation with automatic language detection.
19+
</Card>
2120

22-
<Card title="Flight Search Assistant" icon="plane-departure" href="/examples/laptop-examples/flight-search-assistant">
23-
Python CLI leveraging LFM2.5-1.2B-Thinking for multi-step reasoning and tool calling to find and book flights.
24-
</Card>
21+
<Card title="Flight Search Assistant" icon="plane-departure" href="/examples/laptop-examples/flight-search-assistant">
22+
Python CLI leveraging LFM2.5-1.2B-Thinking for multi-step reasoning and tool calling to find and book flights.
23+
</Card>
2524

26-
<Card title="Audio Car Cockpit Demo" icon="car" href="/examples/laptop-examples/audio-car-cockpit">
27-
Voice-controlled car cockpit interface combining LFM2.5-Audio-1.5B in TTS/STT modes with LFM2-1.2B-Tool. Real-time local processing.
28-
</Card>
25+
<Card title="Audio Car Cockpit Demo" icon="car" href="/examples/laptop-examples/audio-car-cockpit">
26+
Voice-controlled car cockpit interface combining LFM2.5-Audio-1.5B in TTS/STT modes with LFM2-1.2B-Tool. Real-time local processing.
27+
</Card>
2928

30-
<Card title="Meeting Summarization CLI" icon="users" href="/examples/laptop-examples/meeting-summarization">
31-
100% local meeting summarization tool using LFM2-2.6B-Transcript and llama.cpp. No cloud services or API keys required.
32-
</Card>
29+
<Card title="Meeting Summarization CLI" icon="users" href="/examples/laptop-examples/meeting-summarization">
30+
100% local meeting summarization tool using LFM2-2.6B-Transcript and llama.cpp. No cloud services or API keys required.
31+
</Card>
3332

34-
<Card title="Browser Control with GRPO" icon="browser" href="/examples/laptop-examples/browser-control">
35-
Train language models for web automation using reinforcement learning. Demonstrates GRPO fine-tuning with BrowserGym environments.
36-
</Card>
33+
<Card title="Browser Control with GRPO" icon="browser" href="/examples/laptop-examples/browser-control">
34+
Train language models for web automation using reinforcement learning. Demonstrates GRPO fine-tuning with BrowserGym environments.
35+
</Card>
3736

38-
<Card title="Product Slogan Generator" icon="sparkles" href="/examples/android/slogan-generator">
39-
Android app for single-turn generation of creative product slogans using local AI models. Built with traditional Android Views.
40-
</Card>
37+
</CardGroup>
4138

42-
<Card title="Web Content Summarizer" icon="newspaper" href="/examples/android/web-content-summarizer">
43-
Share web pages from any browser to this Android app for instant AI-powered summarization. Complete privacy with local processing.
44-
</Card>
39+
## Android
4540

46-
<Card title="Structured Recipe Generator" icon="utensils" href="/examples/android/recipe-generator-constrained-output">
47-
Generate recipes with guaranteed JSON structure using constrained generation. Demonstrates automatic model downloading with LeapSDK.
48-
</Card>
41+
<CardGroup cols={2}>
4942

50-
<Card title="Vision Language Model Demo" icon="eye" href="/examples/android/vision-language-model-example">
51-
Analyze images and answer visual questions on Android using Vision Language Models. Built with Jetpack Compose and Coil.
52-
</Card>
43+
<Card title="Product Slogan Generator" icon="sparkles" href="/examples/android/slogan-generator">
44+
Android app for single-turn generation of creative product slogans using local AI models. Built with traditional Android Views.
45+
</Card>
46+
47+
<Card title="Web Content Summarizer" icon="newspaper" href="/examples/android/web-content-summarizer">
48+
Share web pages from any browser to this Android app for instant AI-powered summarization. Complete privacy with local processing.
49+
</Card>
50+
51+
<Card title="Structured Recipe Generator" icon="utensils" href="/examples/android/recipe-generator-constrained-output">
52+
Generate recipes with guaranteed JSON structure using constrained generation. Demonstrates automatic model downloading with LeapSDK.
53+
</Card>
54+
55+
<Card title="Vision Language Model Demo" icon="eye" href="/examples/android/vision-language-model-example">
56+
Analyze images and answer visual questions on Android using Vision Language Models. Built with Jetpack Compose and Coil.
57+
</Card>
58+
59+
<Card title="AI Agents with Koog" icon="robot" href="/examples/android/leap-koog-agent">
60+
Build intelligent AI agents on Android with the Koog framework. Demonstrates tool invocation, context management, and MCP integration.
61+
</Card>
62+
63+
</CardGroup>
64+
65+
## Web
66+
67+
<CardGroup cols={2}>
68+
69+
<Card title="Hand & Voice Racer" icon="gamepad" href="/examples/web/hand-voice-racer">
70+
A browser driving game controlled with your hands and voice. MediaPipe tracks hand gestures for steering while LFM2.5-Audio-1.5B transcribes voice commands. Fully local, no server round-trips.
71+
</Card>
72+
73+
<Card title="Audio Browser Demo" icon="waveform-lines" href="/examples/web/audio-webgpu-demo">
74+
Run LFM2.5-Audio-1.5B entirely in the browser with WebGPU. Supports ASR, TTS, and interleaved audio-text conversations. No data sent to external servers.
75+
</Card>
76+
77+
<Card title="Real-Time Video Captioning" icon="video" href="/examples/web/vl-webgpu-demo">
78+
Real-time video captioning with LFM2.5-VL-1.6B running fully client-side via WebGPU and ONNX Runtime Web. No cloud inference required.
79+
</Card>
80+
81+
</CardGroup>
82+
83+
## Model Customization
84+
85+
<CardGroup cols={2}>
86+
87+
<Card title="Car Maker Identification" icon="car" href="/examples/customize-models/car-maker-identification">
88+
Fine-tune LFM2-VL to identify car makers from images. Learn structured generation with Outlines and parameter-efficient fine-tuning with LoRA.
89+
</Card>
5390

54-
<Card title="AI Agents with Koog" icon="robot" href="/examples/android/leap-koog-agent">
55-
Build intelligent AI agents on Android with the Koog framework. Demonstrates tool invocation, context management, and MCP integration.
56-
</Card>
5791
</CardGroup>
5892

5993
## Cannot find the example you need?

examples/web/hand-voice-racer.mdx

Lines changed: 85 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,85 @@
1+
---
2+
title: "Hand & Voice Racer"
3+
---
4+
5+
<Card title="View Source Code" icon="github" href="https://github.com/Liquid4All/cookbook/tree/main/examples/hand-voice-racer">
6+
Browse the complete example on GitHub
7+
</Card>
8+
9+
<iframe
10+
className="w-full aspect-video rounded-xl"
11+
src="https://www.youtube.com/embed/PdmTeDNMP2s"
12+
title="Hand & Voice Racer demo"
13+
allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture"
14+
allowFullScreen
15+
></iframe>
16+
17+
**A browser driving game you control with your hands and voice, powered by models running fully local.**
18+
19+
Steer by holding both hands up like a steering wheel. Speak commands to accelerate, brake, toggle headlights, and play music. No cloud calls, no server round-trips. Everything runs in your browser tab.
20+
21+
## How it works
22+
23+
Two models run in parallel, entirely client-side:
24+
25+
- **[MediaPipe Hand Landmarker](https://ai.google.dev/edge/mediapipe/solutions/vision/hand_landmarker)** tracks your hand positions via webcam at ~30 fps. The angle between your two wrists drives the steering.
26+
- **[LFM2.5-Audio-1.5B](https://docs.liquid.ai/lfm/models/lfm25-audio-1.5b)** runs in a Web Worker with ONNX Runtime Web. It listens for speech via the [Silero VAD](https://github.com/snakers4/silero-vad) and transcribes each utterance on-device. Matched keywords control game state.
27+
28+
The audio model loads from Hugging Face and is cached in IndexedDB after the first run, so subsequent starts are instant.
29+
30+
## Voice commands
31+
32+
| Say | Effect |
33+
|-----|--------|
34+
| `speed` / `fast` / `go` | Accelerate to 120 km/h |
35+
| `slow` / `stop` / `brake` | Decelerate to 0 km/h |
36+
| `lights on` | Enable headlights |
37+
| `lights off` | Disable headlights |
38+
| `music` / `play` | Start the techno beat |
39+
| `stop music` / `silence` | Stop the beat |
40+
41+
## Prerequisites
42+
43+
<Note>
44+
**Browser Requirements**
45+
46+
- Chrome 113+ or Edge 113+ (WebGPU required for fast audio inference; falls back to WASM)
47+
- Webcam and microphone access
48+
- Node.js 18+
49+
</Note>
50+
51+
## Run locally
52+
53+
```bash
54+
npm install
55+
npm run dev
56+
```
57+
58+
Then open [http://localhost:3001](http://localhost:3001).
59+
60+
On first load the audio model (~900 MB at Q4 quantization) downloads from Hugging Face and is cached in your browser. Hand detection assets load from CDN and MediaPipe's model storage.
61+
62+
## Architecture
63+
64+
```
65+
Browser tab
66+
├── main thread
67+
│ ├── MediaPipe HandLandmarker (webcam → hand angles → steering)
68+
│ ├── Canvas 2D renderer (road, scenery, dashboard, HUD)
69+
│ └── Web Audio API (procedural techno synthesizer)
70+
└── audio-worker.js (Web Worker)
71+
├── Silero VAD (mic → speech segments)
72+
└── LFM2.5-Audio-1.5B ONNX (speech segment → transcript → keyword)
73+
```
74+
75+
The game loop runs on `requestAnimationFrame`. Hand detection is throttled to ~30 fps so it does not block rendering. Voice processing happens off the main thread and delivers results via `postMessage`.
76+
77+
## Need help?
78+
79+
<CardGroup cols={1}>
80+
81+
<Card title="Join our Discord" icon="discord" iconType="brands" href="https://discord.gg/DFU3WQeaYD">
82+
Connect with the community and ask questions about this example.
83+
</Card>
84+
85+
</CardGroup>

0 commit comments

Comments
 (0)