Enhance server-based speech recognition and configuration options by 74th · Pull Request #17 · 74th/websocket-control-stackchan

74th · 2026-03-10T13:14:47Z

This pull request refactors the audio format and language handling for speech recognition throughout the codebase to centralize configuration and simplify interfaces. It also adds support for using a Whisper server as a speech recognition backend and introduces a startup script for the Whisper server. The most important changes are summarized below:

Centralization of Audio Format and Language Configuration

Replaced individual sample_rate_hz, channels, sample_width, and language_code parameters with a centralized LISTEN_AUDIO_FORMAT and LISTEN_LANGUAGE_CODE in stackchan_server/listen.py, stackchan_server/speech_recognition/google_cloud.py, and stackchan_server/speech_recognition/whisper_cpp.py. All speech recognition classes and methods now use these shared settings, reducing code duplication and potential for misconfiguration. [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17]

Whisper Server Integration

Added WhisperServerSpeechToText as a new speech recognition backend, integrated into the app selection logic in example_apps/echo.py. The app can now use a remote Whisper server if the appropriate environment variables are set. [1] [2] [3]

Startup Script for Whisper Server

Added misc/whisper-server/run-whisper-server.sh to simplify running the Whisper server with parameters sourced from environment variables.

Improvements to WhisperCppSpeechToText

Made model_path optional and fallback to the STACKCHAN_WHISPER_MODEL environment variable, improving configuration flexibility and error handling. [1] [2]

These changes collectively make the audio handling more robust and consistent, and enable easier deployment and configuration of speech recognition services.

…on and update app configuration

…ech recognition components

…for WhisperCppSpeechToText

74th added 3 commits March 10, 2026 21:37

feat: add WhisperServerSpeechToText for server-based speech recogniti…

17b0e2a

…on and update app configuration

feat: centralize audio format and language code configuration for spe…

fddd0e6

…ech recognition components

feat: make model_path optional and resolve from environment variable …

e964b13

…for WhisperCppSpeechToText

74th merged commit e819cc0 into main Mar 11, 2026
1 check passed

74th deleted the feat/whisper-cpp branch March 11, 2026 10:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhance server-based speech recognition and configuration options#17

Enhance server-based speech recognition and configuration options#17
74th merged 3 commits intomainfrom
feat/whisper-cpp

74th commented Mar 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

74th commented Mar 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant