Skip to content

[Feature] :Add Voice Input/Output support to AI Agent ChatPanel #15

@key88cb

Description

@key88cb

Problem

The current VibeApps desktop environment only supports text-based interaction via the AI Agent ChatPanel. For a system aiming to provide a complete and immersive "desktop experience," lacking voice wake-up and command capabilities (e.g., "Play some jazz," "Open my diary") limits the natural interaction flow and potential accessibility for users.

Proposed Solution

I propose integrating the Web Speech API directly into the existing ChatPanel to enable voice interactions:

  1. Speech-to-Text (Voice Input)
    Add a microphone button next to the chat input field in apps/webuiapps/src/components/ChatPanel/index.tsx.
    Utilize the browser's native SpeechRecognition API (handling webkitSpeechRecognition for cross-browser support).
    Once the user speaks, transcribe the voice directly into text and send it as a standard message payload to the AI Agent.
  2. Text-to-Speech (Voice Output) [Optional]
    Use the speechSynthesis API to read aloud the Agent's text responses, creating a more realistic "virtual assistant" experience without relying on external TTS services.

Alternatives Considered

Integrating third-party commercial APIs (like OpenAI Whisper or other cloud ASR/TTS services) was considered. However, prioritizing the browser's native Web Speech API aligns perfectly with OpenRoom's current architectural philosophy of being "zero-backend" and working "out-of-the-box" entirely within the client environment.

Additional Context

This would likely require adding a permission prompt flow for microphone access upon the first click.
A visual indicator (like a pulsing animation or waveform) during active listening would greatly improve UX.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions