Skip to content

Enhance Google Cloud speech recognition with streaming and asyncio#14

Merged
74th merged 5 commits intomainfrom
feat/recognition-library
Mar 9, 2026
Merged

Enhance Google Cloud speech recognition with streaming and asyncio#14
74th merged 5 commits intomainfrom
feat/recognition-library

Conversation

@74th
Copy link
Owner

@74th 74th commented Mar 9, 2026

This pull request introduces a significant refactor of the speech recognition system to support both streaming and non-streaming speech recognition via a new protocol-based abstraction. The changes decouple the code from a direct dependency on Google Cloud's speech client, introduce new handler classes for audio streaming and recognition, and improve error handling and extensibility. The most important changes are grouped below.

Speech Recognition Abstraction and Implementation:

  • Introduced protocol-based interfaces (SpeechRecognizer, StreamingSpeechRecognizer, StreamingSpeechSession) in stackchan_server/types.py to standardize the speech recognition API and enable pluggable backends.
  • Added a new Google Cloud-based streaming and non-streaming speech recognizer implementation in stackchan_server/speech_recognition/google_cloud.py, supporting both synchronous and streaming recognition.
  • Provided a factory function create_speech_recognizer in stackchan_server/speech_recognition/__init__.py for instantiating the default speech recognizer.

Refactoring and Decoupling:

  • Refactored StackChanApp and WsProxy to use the new SpeechRecognizer abstraction instead of directly depending on Google Cloud's client, improving modularity and testability. [1] [2] [3] [4]

Streaming Audio Handling and Error Management:

  • Introduced ListenHandler in stackchan_server/listen.py to manage streaming audio input, buffering, timeout handling, and integration with the speech recognizer, along with custom error types (TimeoutError, EmptyTranscriptError).
  • Updated WsProxy to delegate audio listening and error handling to ListenHandler, removing redundant internal logic and ensuring proper resource cleanup. [1] [2] [3]

Application Code Update:

  • Updated example_apps/echo.py to handle EmptyTranscriptError during speech recognition, ensuring graceful session termination on empty transcripts. [1] [2]

@74th 74th merged commit 47ce452 into main Mar 9, 2026
1 check passed
@74th 74th deleted the feat/recognition-library branch March 9, 2026 11:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant