feat(voice): add console audio IO and SessionHost audio routing#1694
Open
toubatbrian wants to merge 2 commits into
Open
feat(voice): add console audio IO and SessionHost audio routing#1694toubatbrian wants to merge 2 commits into
toubatbrian wants to merge 2 commits into
Conversation
Port the python tcp_console audio IO to agents-js. TcpAudioInput resamples inbound audio_input frames from the 48 kHz wire rate to the 24 kHz agent rate and feeds them to the STT pipeline; TcpAudioOutput resamples the agent's TTS frames back up, streams them as audio_output messages, and drives the flush/clear playout handshake (blocking the agent turn until the broker reports audio_playback_finished, or reporting an interruption when the buffer is cleared). SessionHost now accepts optional audio IO and routes inbound audio_input/audio_playback_finished messages to it. Co-authored-by: Cursor <cursoragent@cursor.com>
🦋 Changeset detectedLatest commit: 6bdfc5e The changes in this PR will be included in the next version bump. This PR includes changesets to release 34 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
4 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Second PR in the series porting Python's TCP console session support to
agents-js(follows #1693, the transport +updateIoplumbing). This adds the audio IO that lets a console-mode session exchange audio with a local broker (the LiveKit CLIlk sessiondaemon).TcpAudioInput(agents/src/voice/console_io.ts) — resamples inboundaudio_inputframes from the 48 kHz wire rate to the 24 kHz agent rate and feeds them into the baseAudioInputstream the STT pipeline reads from.TcpAudioOutput— resamples the agent's TTS frames back up to the wire rate, streams them asaudio_outputmessages, and drives the flush/clear playout handshake: a flush blocks the agent turn until the broker reportsaudio_playback_finished, or reports an interruption (with a clamped playback position) when the buffer is cleared.SessionHostnow accepts optionalaudioInput/audioOutputand routes inboundaudio_input/audio_playback_finishedmessages to them inrecvLoop.Notes / divergences from the Python port
TcpAudioInputuses a stdlib queue +run_in_executorto bridge the producer and consumer event loops underJobExecutorType.THREAD. The JS console job runs in-process on a single event loop, so aStreamChannelis sufficient — no cross-thread queue.PlaybackFinishedEvent.playbackPositionis reported in seconds to match the baseAudioOutputcontract.SessionHostaudio fields are typed viaimport typefromconsole_io.ts(the TS equivalent of Python'sTYPE_CHECKINGimport) so there's no runtime import cycle.Reference
Ports from Python
livekit-agentscli/tcp_console.py(TcpAudioInput/TcpAudioOutput) andvoice/remote_session.py(SessionHost._dispatch_transport_message).Test plan
New
agents/src/voice/console_io.test.ts(5 cases, all green):TcpAudioInputresamples 48 kHz wire frames to 24 kHz and exposes them on the streamTcpAudioInputdrops frames pushed after closeTcpAudioOutputstreams resampledaudio_output+audio_playback_flush, and the flush handshake completes (uninterrupted) onnotifyPlayoutFinishedTcpAudioOutputreports interruption (clamped position) when the buffer is cleared mid-playoutSessionHostroutesaudio_input->pushFrameandaudio_playback_finished->notifyPlayoutFinishedAlso verified:
pnpm build:agents, ESLint, and Prettier clean on the changed files. The existing room-based path is untouched.Follow-up (next stacked PR)
JobContextfake-job support + CLIconsolesubcommand wiring the transport + audio IO +SessionHosttogether.