Skip to content

feat(voice): add console audio IO and SessionHost audio routing#1694

Open
toubatbrian wants to merge 2 commits into
mainfrom
brian/node-console-audio
Open

feat(voice): add console audio IO and SessionHost audio routing#1694
toubatbrian wants to merge 2 commits into
mainfrom
brian/node-console-audio

Conversation

@toubatbrian
Copy link
Copy Markdown
Contributor

@toubatbrian toubatbrian commented Jun 2, 2026

Summary

Second PR in the series porting Python's TCP console session support to agents-js (follows #1693, the transport + updateIo plumbing). This adds the audio IO that lets a console-mode session exchange audio with a local broker (the LiveKit CLI lk session daemon).

  • TcpAudioInput (agents/src/voice/console_io.ts) — resamples inbound audio_input frames from the 48 kHz wire rate to the 24 kHz agent rate and feeds them into the base AudioInput stream the STT pipeline reads from.
  • TcpAudioOutput — resamples the agent's TTS frames back up to the wire rate, streams them as audio_output messages, and drives the flush/clear playout handshake: a flush blocks the agent turn until the broker reports audio_playback_finished, or reports an interruption (with a clamped playback position) when the buffer is cleared.
  • SessionHost now accepts optional audioInput/audioOutput and routes inbound audio_input / audio_playback_finished messages to them in recvLoop.

Notes / divergences from the Python port

  • Python's TcpAudioInput uses a stdlib queue + run_in_executor to bridge the producer and consumer event loops under JobExecutorType.THREAD. The JS console job runs in-process on a single event loop, so a StreamChannel is sufficient — no cross-thread queue.
  • Time is tracked in milliseconds per the JS conventions; PlaybackFinishedEvent.playbackPosition is reported in seconds to match the base AudioOutput contract.
  • The SessionHost audio fields are typed via import type from console_io.ts (the TS equivalent of Python's TYPE_CHECKING import) so there's no runtime import cycle.

Reference

Ports from Python livekit-agents cli/tcp_console.py (TcpAudioInput/TcpAudioOutput) and voice/remote_session.py (SessionHost._dispatch_transport_message).

Test plan

New agents/src/voice/console_io.test.ts (5 cases, all green):

  • TcpAudioInput resamples 48 kHz wire frames to 24 kHz and exposes them on the stream
  • TcpAudioInput drops frames pushed after close
  • TcpAudioOutput streams resampled audio_output + audio_playback_flush, and the flush handshake completes (uninterrupted) on notifyPlayoutFinished
  • TcpAudioOutput reports interruption (clamped position) when the buffer is cleared mid-playout
  • SessionHost routes audio_input -> pushFrame and audio_playback_finished -> notifyPlayoutFinished

Also verified: pnpm build:agents, ESLint, and Prettier clean on the changed files. The existing room-based path is untouched.

Follow-up (next stacked PR)

  • PR3: unregistered/console run path on the worker + JobContext fake-job support + CLI console subcommand wiring the transport + audio IO + SessionHost together.

Port the python tcp_console audio IO to agents-js. TcpAudioInput resamples
inbound audio_input frames from the 48 kHz wire rate to the 24 kHz agent
rate and feeds them to the STT pipeline; TcpAudioOutput resamples the
agent's TTS frames back up, streams them as audio_output messages, and
drives the flush/clear playout handshake (blocking the agent turn until the
broker reports audio_playback_finished, or reporting an interruption when
the buffer is cleared). SessionHost now accepts optional audio IO and routes
inbound audio_input/audio_playback_finished messages to it.

Co-authored-by: Cursor <cursoragent@cursor.com>
@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented Jun 2, 2026

🦋 Changeset detected

Latest commit: 6bdfc5e

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 34 packages
Name Type
@livekit/agents Patch
@livekit/agents-plugin-anam Patch
@livekit/agents-plugin-assemblyai Patch
@livekit/agents-plugin-baseten Patch
@livekit/agents-plugin-bey Patch
@livekit/agents-plugin-cartesia Patch
@livekit/agents-plugin-cerebras Patch
@livekit/agents-plugin-deepgram Patch
@livekit/agents-plugin-elevenlabs Patch
@livekit/agents-plugin-fishaudio Patch
@livekit/agents-plugin-google Patch
@livekit/agents-plugin-hedra Patch
@livekit/agents-plugin-hume Patch
@livekit/agents-plugin-inworld Patch
@livekit/agents-plugin-lemonslice Patch
@livekit/agents-plugin-liveavatar Patch
@livekit/agents-plugin-livekit Patch
@livekit/agents-plugin-minimax Patch
@livekit/agents-plugin-mistral Patch
@livekit/agents-plugin-mistralai Patch
@livekit/agents-plugin-neuphonic Patch
@livekit/agents-plugin-openai Patch
@livekit/agents-plugin-perplexity Patch
@livekit/agents-plugin-phonic Patch
@livekit/agents-plugin-resemble Patch
@livekit/agents-plugin-rime Patch
@livekit/agents-plugin-runway Patch
@livekit/agents-plugin-sarvam Patch
@livekit/agents-plugin-silero Patch
@livekit/agents-plugin-soniox Patch
@livekit/agents-plugin-tavus Patch
@livekit/agents-plugins-test Patch
@livekit/agents-plugin-trugen Patch
@livekit/agents-plugin-xai Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

chatgpt-codex-connector[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant