A voice dictation benchmarking tool that measures how long speech-to-text transcription takes. Built to test and compare the latency of voice dictation apps on macOS.
DictaBench automates the entire test cycle — triggers dictation via key press, optionally plays audio through a virtual microphone, and captures precise timing metrics on how quickly the transcript appears.
Voice dictation apps advertise "real-time" transcription, but how fast are they really? DictaBench gives you hard numbers:
- Time to first character — how long until the first letter appears after you stop speaking
- Time to last character — when the full transcription is complete
- Dictation duration — how long the app takes from first to last character
- End-to-end latency — total time from key press to final transcript
Run it once for a quick check, or use batch mode to run multiple tests and get average/min/max statistics.
- Single Run mode — trigger one dictation cycle and see timing metrics in real time
- Batch Run mode — run N tests in sequence with configurable intervals, get summary statistics (avg, min, max)
- Audio playback — play audio files (MP3, WAV, OGG, FLAC) simultaneously with key press
- Virtual microphone — route audio to dictation apps as mic input (macOS, no external drivers needed)
- GUI — Tkinter-based interface with live timing display
- CLI — simple command-line mode for basic key press automation
- You click the dictation text box (or batch trigger box)
- DictaBench presses and holds a configurable key (e.g.
fnfor macOS dictation) - Optionally, it plays an audio file through a virtual microphone
- When the key is released, it starts timing
- As text appears in the dictation box, it captures first-character and last-character timestamps
- Timing metrics are displayed in real time
Key press start ──> Key release ──> First character ──> Last character
| | | |
|<── hold time ──>|<─ time to first ─>|<─ duration ─>|
|<──────────── end-to-end latency ────────────────────>|
Run multiple tests and get aggregate statistics:
============================================================
SUMMARY STATISTICS
============================================================
Time to first text:
Average: 0.342s
Min: 0.218s
Max: 0.501s
Time to last text:
Average: 1.847s
Min: 1.203s
Max: 2.441s
End-to-end time:
Average: 3.891s
Min: 3.244s
Max: 4.487s
Successful runs with dictation: 5/5
============================================================
- Python 3.7+
- macOS (for virtual microphone and dictation features; basic key press works cross-platform)
git clone https://github.com/DevStrategist/DictaBench.git
cd DictaBench
pip install -r requirements.txtDictaBench needs accessibility permissions to simulate key presses:
System Settings > Privacy & Security > Accessibility — add your Terminal app or Python IDE.
python run.py- Configure the key to press (default:
fnfor macOS dictation) - Set the hold duration
- Optionally select an audio file and enable virtual mic
- Single Run tab: Click the dictation text box to trigger one test
- Batch Run tab: Set run count and interval, click the trigger box to start
python key_presser.pyBasic key press automation without timing metrics. Follow the interactive prompts.
DictaBench can create a virtual microphone automatically on first launch — no BlackHole or external audio drivers needed. It uses macOS Audio MIDI Setup to create an aggregate device.
Three setup methods are available (automatic, guided, manual). See VIRTUAL_MIC_SOLUTIONS.md for details.
If you already have BlackHole installed (brew install blackhole-2ch), DictaBench will detect and use it automatically.
DictaBench/
├── run.py # GUI launcher
├── run.sh # Shell script launcher
├── key_presser.py # CLI implementation
├── key_presser_gui.py # GUI with timing metrics and batch mode
├── coreaudio_virtual_mic.py # macOS virtual microphone creation
├── requirements.txt # Python dependencies
├── VIRTUAL_MIC_SOLUTIONS.md # Virtual mic setup guide
├── LICENSE # MIT License
└── README.md
| Metric | Description |
|---|---|
| First Text Time | Time from key release to first character appearing in the dictation box |
| Last Text Time | Time from key release to last character appearing |
| Duration | Time from first character to last character (transcription spread) |
| End-to-End | Total time from key press start to last character (full latency) |
- PyAutoGUI — keyboard automation
- Pygame — audio playback
- sounddevice — virtual microphone audio routing
- Tkinter — GUI
MIT — see LICENSE.