-
Notifications
You must be signed in to change notification settings - Fork 6
Add speaker ID function #48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 2 commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,10 @@ | ||
| # Live Real-Time Speaker ID Example | ||
|
|
||
| This example demonstrates how to use the Speechmatics Python SDK to perform speaker ID in real-time. | ||
|
|
||
| The SDK requires an API key to be set as an environment variable before it can be used. You can obtain an API key by signing up for a Speechmatics account at https://portal.speechmatics.com/dashboard | ||
|
|
||
| ## Prerequisites | ||
|
|
||
| - Install Speechmatics RT SDK: `pip install speechmatics-rt` | ||
| - Export Speechmatics API key: `export SPEECHMATICS_API_KEY=YOUR-API-KEY` |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,50 @@ | ||
| import asyncio | ||
| import logging | ||
|
|
||
| from speechmatics.rt import ServerMessageType | ||
| from speechmatics.rt import ( | ||
| AsyncClient, | ||
| AudioEncoding, | ||
| AudioFormat, | ||
| OperatingPoint, | ||
| TranscriptionConfig, | ||
| ) | ||
|
|
||
|
|
||
| logging.basicConfig(level=logging.INFO) | ||
|
|
||
|
|
||
| async def main() -> None: | ||
| """Run async transcription example.""" | ||
|
|
||
| transcription_config = TranscriptionConfig( | ||
| max_delay=0.8, | ||
| enable_partials=True, | ||
| operating_point=OperatingPoint.ENHANCED, | ||
| diarization="speaker", | ||
| ) | ||
|
|
||
| # Initialize client with API key from environment | ||
| async with AsyncClient() as client: | ||
| try: | ||
| @client.on(ServerMessageType.ADD_TRANSCRIPT) | ||
| def handle_finals(msg): | ||
| print(f"Final: {msg['metadata']['transcript']}") | ||
|
|
||
| @client.on(ServerMessageType.SPEAKERS_RESULT) | ||
| def handle_speakers_result(msg): | ||
| print(msg) | ||
|
|
||
| # Transcribe audio file | ||
| with open("./examples/example.wav", "rb") as audio_file: | ||
| await client.transcribe( | ||
| audio_file, | ||
| transcription_config=transcription_config, | ||
| get_speakers=True, | ||
| ) | ||
| except Exception as e: | ||
| print(f"Transcription error: {e}") | ||
|
|
||
|
|
||
| if __name__ == "__main__": | ||
| asyncio.run(main()) | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -171,6 +171,7 @@ async def transcribe( | |
| audio_events_config: Optional[AudioEventsConfig] = None, | ||
| ws_headers: Optional[dict] = None, | ||
| timeout: Optional[float] = None, | ||
| get_speakers: Optional[bool] = False, | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think get speakers should be enabled by default when speaker diarization is requested rather than having an additional parameter to enable it. The user who does't need it can easily ignore it and the user who needs it can capture it just like any other message using an event handler. This makes
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. it depends what type of diarization is requested as it must be set as "speaker" or "channel_and_speaker" only for the speaker id to work. I am in the process of adding the
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's possible that I'm missing some context here but when requested But since you're updating the transcriber to be enabled via
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would say we don't want to auto-enable, as speaker IDs are a form of biometics and storing and handling them has legal consequences beyond our usual transcripts. There's a risk if it's auto-enabled, that customers would receive and even store biometrics without being aware of it, and that could possibly place us in an awkward position? My instinct would be in general that we should err on the side of forcing people to request biometrics rather than assume they want them. Separately, @dln22 when will the get_speakers flag be added to speaker_diarization_config? Just wondering if we want to merge this in the meantime or whether it's better to hold off on the additional flag until it's in the transcriber?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @TudorCRL My changes have been merged already, but not yet officially released. Will let you know once that happens.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @TudorCRL the |
||
| ) -> None: | ||
| """ | ||
| Transcribe a single audio stream in real-time. | ||
|
|
@@ -193,6 +194,7 @@ async def transcribe( | |
| ws_headers: Additional headers to include in the WebSocket handshake. | ||
| timeout: Maximum time in seconds to wait for transcription completion. | ||
| Default None. | ||
| get_speakers: Send a speaker identifier event at the end of the session. | ||
|
|
||
| Raises: | ||
| AudioError: If source is invalid or cannot be read. | ||
|
|
@@ -233,6 +235,9 @@ async def transcribe( | |
| ws_headers=ws_headers, | ||
| ) | ||
|
|
||
| if get_speakers: | ||
| await self.send_message({"message": "GetSpeakers", "final": True}) | ||
|
|
||
| try: | ||
| await asyncio.wait_for( | ||
| self._audio_producer(source, audio_format.chunk_size), | ||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.