OpenSTT

Local-first speech-to-text hub with multiple local engines.

A native macOS app that unifies Whisper and MLX models behind a single OpenAI-compatible API endpoint — plus system-wide dictation with a global hotkey.

Requirements: macOS on Apple Silicon (M1 / M2 / M3 / M4). Intel Macs are not supported.

Features

Multiple engines — Local Whisper (whisper.cpp with Metal), local MLX models, plus cloud options (ElevenLabs, Soniox)
OpenAI-compatible API — POST /v1/audio/transcriptions on localhost, drop-in replacement
System-wide dictation — Hold a global shortcut to record, release to transcribe, auto-paste into any app
- Real-time tray timer showing listening & processing phases
- Support for real-time streaming models (ElevenLabs Scribe V2, Soniox STT-RT v4)
Model management — Download, switch, and delete models from the GUI with concurrent download support
Playground — Built-in record-and-transcribe for quick testing
Auto-updater — Automatic update checks on startup with one-click install (uses Sparkle to preserve macOS permissions)
Onboarding flow — First-launch setup wizard with permissions check and standalone Python auto-download
Real-time status — Overview page with MLX runtime status and sidebar indicators

Tech Stack

Frontend: React + TypeScript + Vite
Backend: Rust + Tauri v2
STT: whisper.cpp (via whisper-rs with Metal), MLX Audio sidecar
Updater: Sparkle (tauri-plugin-sparkle-updater) for atomic DMG-based updates that preserve code signature
Platform: macOS Apple Silicon only

Development

# Install dependencies
npm install

# Run in development mode
npm run tauri dev

# Build for production
npm run tauri build

Supported Models

Local Models

Whisper MLX — 12 models (tiny to large-v3) running locally with Metal acceleration
Whisper.cpp — Local Whisper implementation via whisper-rs

Cloud Models

ElevenLabs — Scribe V2, Scribe V2 Flash, Scribe V2 Realtime
Soniox — STT-RT v4 realtime dictation

Known Issues

ElevenLabs Scribe V2 Realtime — text injection unreliable (critical)

The elevenlabs:scribe_v2_realtime model streams partial_transcript and committed_transcript over WebSocket. The client must inject these into the focused app in real time.

Problem: macOS CGEvent keyboard events (backspace + set_string) are unreliable at high frequency — the OS and/or active IME silently drops events, causing characters to be lost or garbled. The ElevenLabs API also frequently revises partial transcripts (not just appending), which requires deleting previously typed text.

Approaches tried (all insufficient):

Approach	Result
Full-draft rewrite (backspace N + type N)	Events dropped at high frequency → garbled text
Incremental diff (common-prefix, only touch suffix)	Helps for monotonic growth, but API revisions still cause large deletions
Clipboard paste (Cmd+V) for insertion	Insertion is reliable, but backspace deletion still drops events
Commit-only mode (skip partials, paste on commit)	Text is correct but loses real-time capability; also timing issue where typing task exits before final flush arrives

Current state: Commit-only mode is implemented as a stopgap. Partials are skipped; only committed transcripts are pasted via clipboard. The real-time typing experience is lost.

Needs investigation: How other macOS dictation tools (Superwhisper, Whisper Flow, macOS built-in dictation, Talon Voice) solve reliable text replacement — likely via Input Method Kit (IMK) or Accessibility API (AXUIElement) rather than raw CGEvent key injection.

Roadmap

Auto-updater with Sparkle to preserve macOS permissions after update
Real-time streaming dictation support (ElevenLabs Scribe V2 Realtime, Soniox STT-RT v4)
First-launch onboarding with permissions check
Standalone Python auto-download for MLX runtime
Migrate macOS MLX inference from Python sidecar to mlx-audio-swift for native performance and zero Python dependency
Fix real-time text injection for ElevenLabs Scribe V2 Realtime (see Known Issues)

License

MIT

OpenSTT

聚合多种本地引擎的语音转文字 Hub。

一个原生 macOS 应用，将 Whisper 和 MLX 模型统一在一个 OpenAI 兼容的 API 端点之后，同时提供全局快捷键系统级听写。

系统要求： macOS Apple Silicon（M1 / M2 / M3 / M4）。不支持 Intel Mac。

功能

多引擎支持 — 本地 Whisper (whisper.cpp，Metal 加速)、本地 MLX 模型，以及云端选项（ElevenLabs、Soniox）
OpenAI 兼容 API — 本地 POST /v1/audio/transcriptions，可直接替换
系统级听写 — 按住全局快捷键录音，松开转写，自动粘贴到当前应用
- 实时托盘计时器显示录音和转写阶段
- 支持实时流式模型（ElevenLabs Scribe V2、Soniox STT-RT v4）
模型管理 — 在界面中下载、切换、删除模型，支持并发下载
试听台 — 内置录音转写，便于快速测试
自动更新 — 启动时自动检查更新，一键安装（使用 Sparkle 保留 macOS 权限）
引导流程 — 首次启动的设置向导，包含权限检查和独立 Python 自动下载
实时状态 — 总览页面显示 MLX 运行状态，侧边栏状态指示器

技术栈

前端: React + TypeScript + Vite
后端: Rust + Tauri v2
STT 引擎: whisper.cpp (通过 whisper-rs，Metal 加速)、MLX Audio 侧车
更新器: Sparkle (tauri-plugin-sparkle-updater) 原子化 DMG 更新，保留代码签名
平台: 仅支持 macOS Apple Silicon

开发

# 安装依赖
npm install

# 开发模式运行
npm run tauri dev

# 生产构建
npm run tauri build

支持的模型

本地模型

Whisper MLX — 12 个模型（tiny 到 large-v3），本地 Metal 加速运行
Whisper.cpp — 通过 whisper-rs 实现的本地 Whisper

云端模型

ElevenLabs — Scribe V2、Scribe V2 Flash、Scribe V2 Realtime
Soniox — STT-RT v4 实时听写

已知问题

ElevenLabs Scribe V2 Realtime — 实时文本注入不可靠（严重）

elevenlabs:scribe_v2_realtime 通过 WebSocket 流式返回 partial_transcript 和 committed_transcript，客户端需要将其实时注入到当前焦点应用。

问题： macOS CGEvent 键盘事件（退格 + set_string）在高频下不可靠——系统或输入法会静默丢弃事件，导致吃字或乱码。ElevenLabs API 还会频繁全量修订 partial（而非仅追加），需要删除已输入的文本。

已尝试的方案（均不足）：

方案	结果
全量重写（退格 N + 输入 N）	高频下事件丢失 → 乱码
增量 diff（公共前缀，只改后缀）	单调增长时有效，但 API 修订仍需大量退格
剪贴板粘贴（Cmd+V）插入	插入可靠，但退格删除仍丢事件
Commit-only 模式（跳过 partial，仅粘贴 committed）	文本正确但失去实时能力；且存在 typing task 先于 flush 退出的时序问题

当前状态： 已实现 commit-only 模式作为临时方案。跳过 partial，仅在 committed 时通过剪贴板粘贴。实时打字体验丧失。

待调研： 其他 macOS 听写工具（Superwhisper、Whisper Flow、macOS 原生听写、Talon Voice）如何实现可靠的文本替换——可能通过 Input Method Kit (IMK) 或 Accessibility API (AXUIElement)，而非原始 CGEvent 键盘注入。

路线图

使用 Sparkle 自动更新，保留 macOS 更新后的权限
实时流式听写支持（ElevenLabs Scribe V2 Realtime、Soniox STT-RT v4）
首次启动引导，包含权限检查
MLX 运行时的独立 Python 自动下载
将 macOS MLX 推理从 Python 侧车迁移到 mlx-audio-swift，实现原生性能、无需 Python 依赖
修复 ElevenLabs Scribe V2 Realtime 实时文本注入（见已知问题）

许可证

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
.vscode		.vscode
docs/plans/2025-02-13-auto-updater-design		docs/plans/2025-02-13-auto-updater-design
plans		plans
public		public
scripts		scripts
src-tauri		src-tauri
src		src
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
index.html		index.html
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json
tsconfig.node.json		tsconfig.node.json
vite.config.ts		vite.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OpenSTT

Features

Tech Stack

Development

Supported Models

Local Models

Cloud Models

Known Issues

ElevenLabs Scribe V2 Realtime — text injection unreliable (critical)

Roadmap

License

OpenSTT

功能

技术栈

开发

支持的模型

本地模型

云端模型

已知问题

ElevenLabs Scribe V2 Realtime — 实时文本注入不可靠（严重）

路线图

许可证

About

Uh oh!

Releases 1

Packages

Contributors 2

Uh oh!

Languages

lulucatdev/openstt

Folders and files

Latest commit

History

Repository files navigation

OpenSTT

Features

Tech Stack

Development

Supported Models

Local Models

Cloud Models

Known Issues

ElevenLabs Scribe V2 Realtime — text injection unreliable (critical)

Roadmap

License

OpenSTT

功能

技术栈

开发

支持的模型

本地模型

云端模型

已知问题

ElevenLabs Scribe V2 Realtime — 实时文本注入不可靠（严重）

路线图

许可证

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 2

Uh oh!

Languages

Packages