-
Notifications
You must be signed in to change notification settings - Fork 35
Description
Problem
The current VibeApps desktop environment only supports text-based interaction via the AI Agent ChatPanel. For a system aiming to provide a complete and immersive "desktop experience," lacking voice wake-up and command capabilities (e.g., "Play some jazz," "Open my diary") limits the natural interaction flow and potential accessibility for users.
Proposed Solution
I propose integrating the Web Speech API directly into the existing ChatPanel to enable voice interactions:
- Speech-to-Text (Voice Input)
Add a microphone button next to the chat input field in apps/webuiapps/src/components/ChatPanel/index.tsx.
Utilize the browser's native SpeechRecognition API (handling webkitSpeechRecognition for cross-browser support).
Once the user speaks, transcribe the voice directly into text and send it as a standard message payload to the AI Agent. - Text-to-Speech (Voice Output) [Optional]
Use the speechSynthesis API to read aloud the Agent's text responses, creating a more realistic "virtual assistant" experience without relying on external TTS services.
Alternatives Considered
Integrating third-party commercial APIs (like OpenAI Whisper or other cloud ASR/TTS services) was considered. However, prioritizing the browser's native Web Speech API aligns perfectly with OpenRoom's current architectural philosophy of being "zero-backend" and working "out-of-the-box" entirely within the client environment.
Additional Context
This would likely require adding a permission prompt flow for microphone access upon the first click.
A visual indicator (like a pulsing animation or waveform) during active listening would greatly improve UX.