feat(smallestai): word_timestamps for TTS, v4 STT endpoints, eou_timeout fix by harshitajain165 · Pull Request #5953 · livekit/agents

harshitajain165 · 2026-06-03T10:07:57Z

Summary

Three improvements to the livekit-plugins-smallestai plugin:

1. Word-level timestamps for TTS (`smallestai.TTS`)

Adds opt-in word_timestamps parameter to the Smallest AI WebSocket TTS integration, matching the feature shipped in Lightning v3.1 and v3.1 Pro.

New word_timestamps: bool = False constructor parameter
Sets aligned_transcript=word_timestamps on TTSCapabilities
Sends word_timestamps: true in the WebSocket payload when enabled
Handles word_timestamp status events by calling output_emitter.push_timed_transcript(TimedString(...))
Supported on base-queue English + Hindi voices (meher, devansh, kartik, maithili, liam, avery); other voices silently emit no word events

tts = smallestai.TTS(
    word_timestamps=True,  # opt in to per-word timed transcript entries
)

2. STT endpoints updated to v4 API format (`smallestai.STT`)

The Smallest AI API moved from /{model}/get_text (path-based model) to model as a query parameter:

Streaming: wss://api.smallest.ai/waves/v1/stt/live?model=pulse
Batch: https://api.smallest.ai/waves/v1/stt/?model=pulse

3. Fix `eou_timeout_ms` bug (`smallestai.STT`)

The old > 0 guard silently omitted eou_timeout_ms when set to 0, causing the server to apply its 800ms default EOU detection — which conflicts with LiveKit's own VAD-based turn detection.

The fix always sends eou_timeout_ms, so the default of 0 explicitly disables server-side EOU and lets LiveKit's VAD control turn detection entirely. Users who want server-side EOU can pass 100–10000.

Test plan

Verify TTS word timestamps fire push_timed_transcript events for supported voices (meher, devansh, etc.)
Verify unsupported voices work normally with word_timestamps=True (no errors, just no transcript events)
Verify STT streaming connects successfully to new endpoint
Verify STT batch transcription works with new endpoint
Verify eou_timeout_ms=0 disables server EOU (no server-triggered finals without LiveKit VAD triggering first)
Verify eou_timeout_ms=500 enables server EOU at 500ms

Adds opt-in (default on) per-word timing events to the Smallest AI WebSocket TTS integration, mirroring the pipecat implementation. - Add word_timestamps: bool = True to _TTSOptions, TTS.__init__, and update_options() - Set aligned_transcript=word_timestamps on TTSCapabilities so the framework knows word-level timing is available - Send word_timestamps: true in the WebSocket payload when enabled - Handle word_timestamp status events by calling output_emitter.push_timed_transcript(TimedString(...)) Supported on base-queue English + Hindi voices (meher, devansh, kartik, maithili, liam, avery); other voices emit no word events so the default-on is safe for all voices.

Old format used /{model}/get_text as the path segment. New API uses /stt/ (batch) and /stt/live (streaming) with model as a query parameter instead. - Batch: https://api.smallest.ai/waves/v1/stt/?model={model} - Streaming: wss://api.smallest.ai/waves/v1/stt/live?model={model}

Aligns with the raw API behavior — word timestamps are opt-in, matching docs.smallest.ai which requires passing word_timestamps=true explicitly to enable the feature.

The old > 0 guard silently omitted the parameter when 0, causing the server to apply its 800ms default EOU detection — conflicting with LiveKit's own VAD-based turn detection. Always send eou_timeout_ms so that the default of 0 explicitly disables server-side EOU. Users who want server EOU can pass 100–10000.

harshitajain165 · 2026-06-03T12:51:45Z

Hey @tinalenguyen
Requesting a review whenever you get a chance

devin-ai-integration

Devin Review found 1 potential issue.

View 3 additional findings in Devin Review.

devin-ai-integration · 2026-06-03T12:54:13Z

🔴 STTCapabilities.streaming is hardcoded to True even for pulse-pro which doesn't support streaming

When model="pulse-pro" is passed to STT.__init__(), the capabilities are still constructed with streaming=True (line 148). However, stream() raises ValueError for pulse-pro (lines 245-248). The agent framework checks capabilities.streaming at livekit-agents/livekit/agents/voice/agent.py:423 to decide whether to call stream() directly or wrap with a StreamAdapter. Because streaming=True, the framework will skip the StreamAdapter wrapping and call stream() directly at agent.py:433, which crashes with ValueError("pulse-pro does not support streaming..."). The streaming capability should be conditional on the model.

(Refers to line 148)

Was this helpful? React with 👍 or 👎 to provide feedback.

Seems like a legit comment?

harshitajain165 added 4 commits June 3, 2026 14:33

fix(smallestai): change word_timestamps default to False in TTS plugin

8591f8a

Aligns with the raw API behavior — word timestamps are opt-in, matching docs.smallest.ai which requires passing word_timestamps=true explicitly to enable the feature.

harshitajain165 force-pushed the feat/smallest-tts-word-timestamps branch from 3a5854c to 86d15fe Compare June 3, 2026 10:28

harshitajain165 marked this pull request as ready for review June 3, 2026 12:51

devin-ai-integration Bot reviewed Jun 3, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(smallestai): word_timestamps for TTS, v4 STT endpoints, eou_timeout fix#5953

feat(smallestai): word_timestamps for TTS, v4 STT endpoints, eou_timeout fix#5953
harshitajain165 wants to merge 4 commits into
livekit:mainfrom
harshitajain165:feat/smallest-tts-word-timestamps

harshitajain165 commented Jun 3, 2026 •

edited

Loading

Uh oh!

harshitajain165 commented Jun 3, 2026

Uh oh!

devin-ai-integration Bot left a comment

Uh oh!

devin-ai-integration Bot Jun 3, 2026

Uh oh!

theomonnom Jun 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

harshitajain165 commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

1. Word-level timestamps for TTS (smallestai.TTS)

2. STT endpoints updated to v4 API format (smallestai.STT)

3. Fix eou_timeout_ms bug (smallestai.STT)

Test plan

Uh oh!

harshitajain165 commented Jun 3, 2026

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

theomonnom Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

harshitajain165 commented Jun 3, 2026 •

edited

Loading

1. Word-level timestamps for TTS (`smallestai.TTS`)

2. STT endpoints updated to v4 API format (`smallestai.STT`)

3. Fix `eou_timeout_ms` bug (`smallestai.STT`)