Skip to content
Draft
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 23 additions & 7 deletions livekit-plugins/livekit-plugins-speechmatics/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,20 @@ Speechmatics STT engine can be configured to emit information about individual s

You should adjust your system instructions to inform the LLM of this format for speaker identification.

To configure the formatting of the speaker labels, use the `speaker_active_format` and `speaker_passive_format` parameters. You can also add extra instructions into your agent instructions so that the LLM knows how to handle the speaker labels.

```plain
Speakers will be identified by `<speaker_id>` tags, either `Sn` for newly identified speakers or `Name` for previously known named speakers. For example, `<S1>Hello</S1>` for an active speaker or `<passive><Bob>Hello</Bob></passive>` for a speaker in the background. Use the conversation context to determine the name of the speaker.
```

```python
stt=speechmatics.STT(
enable_diarization=True,
speaker_active_format="<{speaker_id}>{text}</{speaker_id}>",
speaker_passive_format="<passive><{speaker_id}>{text}</{speaker_id}></passive>",
),
```

## Turn detection modes

The `turn_detection_mode` parameter controls how end-of-turn is detected:
Expand All @@ -37,20 +51,21 @@ The `end_of_utterance_silence_trigger` parameter controls the amount of silence
Usage:

```python
from livekit.agents import AgentSession
from livekit.agents import AgentSession, EndpointingOptions, TurnHandlingOptions
from livekit.plugins.turn_detector.multilingual import MultilingualModel
from livekit.plugins import speechmatics, silero

agent = AgentSession(
stt=speechmatics.STT(
end_of_utterance_silence_trigger=0.2,
enable_diarization=True,
speaker_active_format="[Speaker {speaker_id}] {text}",
speaker_passive_format="[Speaker {speaker_id} *PASSIVE*] {text}",
),
vad=silero.VAD.load(),
turn_detection=MultilingualModel(),
min_endpointing_delay=0.3,
max_endpointing_delay=5.0,
turn_handling=TurnHandlingOptions(
endpointing=EndpointingOptions(min_delay=0.3, max_delay=5.0),
turn_detection=MultilingualModel(),
),
...
)
```
Expand All @@ -60,12 +75,13 @@ agent = AgentSession(
To delegate end-of-turn detection to Speechmatics, set `turn_detection_mode=TurnDetectionMode.ADAPTIVE` (or `SMART_TURN` / `FIXED`) and pair it with `turn_detection="stt"` on the `AgentSession`.

```python
from livekit.agents import AgentSession
from livekit.agents import AgentSession, TurnHandlingOptions
from livekit.plugins import speechmatics

agent = AgentSession(
stt=speechmatics.STT(
turn_detection_mode=speechmatics.TurnDetectionMode.ADAPTIVE,
enable_diarization=True,
speaker_active_format="[Speaker {speaker_id}] {text}",
speaker_passive_format="[Speaker {speaker_id} *PASSIVE*] {text}",
additional_vocab=[
Expand All @@ -75,7 +91,7 @@ agent = AgentSession(
),
],
),
turn_detection="stt",
turn_handling=TurnHandlingOptions(turn_detection="stt"),
...
)
```
Expand Down