Disclaimer: This is an unofficial, community-maintained Docker wrapper around OpenAI Whisper. It is not affiliated with or endorsed by OpenAI. This project is maintained as a personal hobby — use it as-is, no support is guaranteed.
This Docker image provides a convenient environment for running OpenAI Whisper, a powerful automatic speech recognition (ASR) system. Based on Ubuntu 24.04 with Python 3.12 and all necessary dependencies included.
- Docker
- NVIDIA Container Toolkit (for GPU mode only)
- Place your audio files in the
audio-files/directory - Run the transcription (see examples below)
- Find the output in the same
audio-files/directory
Build:
docker compose --profile gpu buildGPU mode:
docker compose run --rm whisper-gpu whisper audio-file.mp3 --device cuda --model turbo --language Italian --output_dir /app --output_format txtCPU mode:
docker compose run --rm whisper-cpu whisper audio-file.mp3 --model turbo --language Italian --output_dir /app --output_format txtBuild:
docker build -t openai-whisper .GPU mode:
docker run --gpus all -it -v ${PWD}/models:/root/.cache/whisper -v ${PWD}/audio-files:/app openai-whisper whisper audio-file.mp3 --device cuda --model turbo --language Italian --output_dir /app --output_format txtCPU mode:
docker run -it -v ${PWD}/models:/root/.cache/whisper -v ${PWD}/audio-files:/app openai-whisper whisper audio-file.mp3 --model turbo --language Italian --output_dir /app --output_format txt| Local Path | Container Path | Description |
|---|---|---|
./models |
/root/.cache/whisper |
Cached Whisper models (persisted between runs) |
./audio-files |
/app |
Input audio files and output transcriptions |
| Model | VRAM Required | Description |
|---|---|---|
large-v3 |
10-15 GB | Most accurate, recommended for powerful GPUs |
turbo |
~8 GB | Memory-efficient, near-comparable accuracy |
Check GPU information:
docker run --gpus all -it openai-whisper nvidia-smi