Skip to content

DevStrategist/DictaBench

Repository files navigation

DictaBench

A voice dictation benchmarking tool that measures how long speech-to-text transcription takes. Built to test and compare the latency of voice dictation apps on macOS.

DictaBench automates the entire test cycle — triggers dictation via key press, optionally plays audio through a virtual microphone, and captures precise timing metrics on how quickly the transcript appears.

Python Platform License

Why This Exists

Voice dictation apps advertise "real-time" transcription, but how fast are they really? DictaBench gives you hard numbers:

  • Time to first character — how long until the first letter appears after you stop speaking
  • Time to last character — when the full transcription is complete
  • Dictation duration — how long the app takes from first to last character
  • End-to-end latency — total time from key press to final transcript

Run it once for a quick check, or use batch mode to run multiple tests and get average/min/max statistics.

Features

  • Single Run mode — trigger one dictation cycle and see timing metrics in real time
  • Batch Run mode — run N tests in sequence with configurable intervals, get summary statistics (avg, min, max)
  • Audio playback — play audio files (MP3, WAV, OGG, FLAC) simultaneously with key press
  • Virtual microphone — route audio to dictation apps as mic input (macOS, no external drivers needed)
  • GUI — Tkinter-based interface with live timing display
  • CLI — simple command-line mode for basic key press automation

How It Works

  1. You click the dictation text box (or batch trigger box)
  2. DictaBench presses and holds a configurable key (e.g. fn for macOS dictation)
  3. Optionally, it plays an audio file through a virtual microphone
  4. When the key is released, it starts timing
  5. As text appears in the dictation box, it captures first-character and last-character timestamps
  6. Timing metrics are displayed in real time
Key press start ──> Key release ──> First character ──> Last character
       |                |                |                    |
       |<── hold time ──>|<─ time to first ─>|<─ duration ─>|
       |<──────────── end-to-end latency ────────────────────>|

Batch Mode

Run multiple tests and get aggregate statistics:

============================================================
SUMMARY STATISTICS
============================================================
Time to first text:
  Average: 0.342s
  Min:     0.218s
  Max:     0.501s

Time to last text:
  Average: 1.847s
  Min:     1.203s
  Max:     2.441s

End-to-end time:
  Average: 3.891s
  Min:     3.244s
  Max:     4.487s

Successful runs with dictation: 5/5
============================================================

Installation

Requirements

  • Python 3.7+
  • macOS (for virtual microphone and dictation features; basic key press works cross-platform)

Setup

git clone https://github.com/DevStrategist/DictaBench.git
cd DictaBench
pip install -r requirements.txt

macOS Permissions

DictaBench needs accessibility permissions to simulate key presses:

System Settings > Privacy & Security > Accessibility — add your Terminal app or Python IDE.

Usage

GUI (Recommended)

python run.py
  1. Configure the key to press (default: fn for macOS dictation)
  2. Set the hold duration
  3. Optionally select an audio file and enable virtual mic
  4. Single Run tab: Click the dictation text box to trigger one test
  5. Batch Run tab: Set run count and interval, click the trigger box to start

CLI

python key_presser.py

Basic key press automation without timing metrics. Follow the interactive prompts.

Virtual Microphone (macOS)

DictaBench can create a virtual microphone automatically on first launch — no BlackHole or external audio drivers needed. It uses macOS Audio MIDI Setup to create an aggregate device.

Three setup methods are available (automatic, guided, manual). See VIRTUAL_MIC_SOLUTIONS.md for details.

If you already have BlackHole installed (brew install blackhole-2ch), DictaBench will detect and use it automatically.

Project Structure

DictaBench/
├── run.py                    # GUI launcher
├── run.sh                    # Shell script launcher
├── key_presser.py            # CLI implementation
├── key_presser_gui.py        # GUI with timing metrics and batch mode
├── coreaudio_virtual_mic.py  # macOS virtual microphone creation
├── requirements.txt          # Python dependencies
├── VIRTUAL_MIC_SOLUTIONS.md  # Virtual mic setup guide
├── LICENSE                   # MIT License
└── README.md

Timing Metrics Explained

Metric Description
First Text Time Time from key release to first character appearing in the dictation box
Last Text Time Time from key release to last character appearing
Duration Time from first character to last character (transcription spread)
End-to-End Total time from key press start to last character (full latency)

Built With

License

MIT — see LICENSE.

About

Benchmark voice dictation apps — measure transcription latency with precise timing metrics

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors