Google's cookbook examples (Get_started_LiveAPI.py, Get_started_LiveAPI_NativeAudio.py) explicitly state:
Important: Use headphones. This script uses the system default audio input and output, which often won't include echo cancellation. So to prevent the model from interrupting itself it is important that you use headphones.
The root cause: VAD (Voice Activity Detection) on the server can't distinguish between the user speaking and the model's own audio leaking back through the mic. It treats echo as a user interruption and cancels the ongoing generation.
- iOS AEC via
.voiceChatmode — UseAVAudioSessionwith.voiceChatmode +.defaultToSpeaker. CheckisEchoCancelledInputAvailableat runtime. This is the native platform solution. - Client-side mic suppression — Stop sending audio frames to the WebSocket while playback is active. Resume ~200-500ms after playback stops. Simple half-duplex approach, but prevents user barge-in.
NO_INTERRUPTIONactivity handling — SetactivityHandling: NO_INTERRUPTIONin the setup config. Model continues speaking even if VAD fires. Downside: user can't interrupt at all.- Disable auto-VAD + manual control — Set
automaticActivityDetection.disabled: true, then sendActivityStart/ActivityEndmanually. Since we know when playback is happening, we can suppress activity signals during echo. - Tune VAD sensitivity — Set
startOfSpeechSensitivity: LOWto raise the trigger threshold. Reduces false positives but community reports this alone is insufficient for speakerphone. - Proactive Audio (preview) — New feature where the model distinguishes speech directed at the device vs background audio. Could help ignore echo, but unconfirmed and in preview.
- google-gemini/live-api-web-console#117 — model stopping mid-sentence
- https://discuss.ai.google.dev/t/disable-interruptions-for-audio-streaming-for-multimodal-live-api/61689
- https://discuss.ai.google.dev/t/how-do-i-prevent-the-live-api-from-discarding-audio-when-its-given-audio-while-it-speaks/73795
- https://community.openai.com/t/realtime-api-starts-to-answer-itself-with-mic-speaker-setup/977801
- Ensure we're using
.voiceChataudio session mode (enables hardware AEC) - Tune VAD sensitivity to
LOWas a baseline - Consider disabling auto-VAD and implementing echo-aware manual turn detection — we already know when playback is active
- Fall back to mic suppression during playback if AEC proves insufficient on speakerphone