GitHub - daryljones/speech-to-text

speech-to-text — Flask UI for faster-whisper transcription

This project provides a small Flask web UI to upload audio files (mp3, wav, flac) and transcribe them using faster-whisper.

Key files

app.py — Flask application and transcription handler (uses faster-whisper)
templates/ — Jinja2 templates (index.html, result.html)
static/ — styles, images and client JS
pyproject.toml — project manifest (dependencies)

Quick start (development)

Sync dependencies and create the virtual environment with uv:

uv sync

Start the app (development):

uv run --reload
# or use the venv python directly
./.venv/bin/python -m app

Open http://127.0.0.1:8000/ in your browser.

Running in production with Gunicorn

Ensure dependencies are installed (see above):

uv sync

Start Gunicorn using the included config:

gunicorn -c gunicorn_conf.py app:app

Gunicorn will bind to 0.0.0.0:8000 by default; adjust gunicorn_conf.py as needed.

What the UI does

Upload an audio file (mp3, wav, flac).
Upload an audio file (mp3, wav, flac, au).
The UI shows upload progress and an indeterminate "Transcribing..." state while the server runs faster-whisper.
After completion the transcript is displayed and you can copy it, download it as a .txt, or go back to upload another file.

Where files are stored

Uploaded audio files are saved to a temporary file on disk (OS temp directory, e.g. /tmp) only for the duration of transcription and are deleted immediately after processing.
Transcribed text is not persisted to disk by default; it is rendered into the result page and kept in memory only for the request.

Environment variables

FLASK_SECRET — Flask secret key used for session/flashing (default: change-me)
WHISPER_DEVICE — optional override for model device (cpu, cuda, mps)
WHISPER_COMPUTE_TYPE — optional override for compute type (float32, float16)

These env vars can be set before running the app, for example:

export WHISPER_DEVICE=cuda
export WHISPER_COMPUTE_TYPE=float16
export FLASK_SECRET='a-secret'
uv run --reload

Notes on performance & model files

The first run will download model weights to your machine (HF cache). Faster-whisper may also use ctranslate2 and other compiled backends; those caches live in the normal HF/ctranslate2 cache locations on your system.
On CPU-only machines the app prefers float32 to avoid inefficient float16 conversions; on GPU-enabled machines the code attempts to select float16 for better throughput.

Security & privacy

Uploaded audio and generated transcripts are not retained by default. If you change the app to persist data, consider access controls and GDPR/privacy requirements.

Extending persistence (optional)

If you want to keep transcripts or uploads, two low-risk approaches are:
- Save files to an uploads/ folder and transcripts to transcripts/ as .txt files (simple, file-based). Add rotation/cleanup.
- Save transcripts to a small SQLite database with metadata (filename, timestamp, duration, language).

Troubleshooting

If you see a warning about float16 -> float32, either force WHISPER_COMPUTE_TYPE=float32 or run on a machine with appropriate GPU/backends.
If uv sync shows a TOML warning about [tool.uv.scripts] ignore it (config quirk). The project should still install dependencies.
If uploads through an nginx proxy return HTTP 413 (Request Entity Too Large), increase client_max_body_size in your nginx config. Example:

server {
	listen 80;
	server_name example.com;

	client_max_body_size 100M; # set to desired size

	location / {
		proxy_pass http://127.0.0.1:8000;
		proxy_set_header Host $host;
		proxy_set_header X-Real-IP $remote_addr;
		proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
		proxy_set_header X-Forwarded-Proto $scheme;
	}
}

Reload nginx after change:

sudo nginx -t && sudo systemctl reload nginx

Feedback & next steps

I can: add persistent storage, SRT/VTT export formats, WebSocket/SSE partial updates, or admin pages that show model/device state. Tell me which feature you want next.

Systemd unit (example)

Copy the example unit to /etc/systemd/system/ and an environment file to /etc/default/:

sudo cp deploy/speech-to-text.service /etc/systemd/system/speech-to-text.service
sudo cp deploy/speech-to-text.env.example /etc/default/speech-to-text
sudo systemctl daemon-reload
sudo systemctl enable --now speech-to-text.service

The unit uses the project's .venv gunicorn binary and the gunicorn_conf.py config. Adjust paths or the User/Group in the unit as needed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
deploy		deploy
images		images
static		static
templates		templates
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
app.py		app.py
gunicorn_conf.py		gunicorn_conf.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages