Enhance server-based speech recognition and configuration options#17
Merged
Enhance server-based speech recognition and configuration options#17
Conversation
…on and update app configuration
…ech recognition components
…for WhisperCppSpeechToText
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This pull request refactors the audio format and language handling for speech recognition throughout the codebase to centralize configuration and simplify interfaces. It also adds support for using a Whisper server as a speech recognition backend and introduces a startup script for the Whisper server. The most important changes are summarized below:
Centralization of Audio Format and Language Configuration
sample_rate_hz,channels,sample_width, andlanguage_codeparameters with a centralizedLISTEN_AUDIO_FORMATandLISTEN_LANGUAGE_CODEinstackchan_server/listen.py,stackchan_server/speech_recognition/google_cloud.py, andstackchan_server/speech_recognition/whisper_cpp.py. All speech recognition classes and methods now use these shared settings, reducing code duplication and potential for misconfiguration. [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17]Whisper Server Integration
WhisperServerSpeechToTextas a new speech recognition backend, integrated into the app selection logic inexample_apps/echo.py. The app can now use a remote Whisper server if the appropriate environment variables are set. [1] [2] [3]Startup Script for Whisper Server
misc/whisper-server/run-whisper-server.shto simplify running the Whisper server with parameters sourced from environment variables.Improvements to WhisperCppSpeechToText
model_pathoptional and fallback to theSTACKCHAN_WHISPER_MODELenvironment variable, improving configuration flexibility and error handling. [1] [2]These changes collectively make the audio handling more robust and consistent, and enable easier deployment and configuration of speech recognition services.