Skip to content

TimInTech/blitztext-linux

Blitztext Linux Banner

Blitztext Linux

Your local AI voice assistant for KDE Plasma & Wayland

Blitztext Linux CI License: MIT Platform

🇬🇧 English | 🇩🇪 Deutsch

Record speech via hotkey, transcribe locally or online, optionally rewrite it with an LLM, and paste it directly into the active application.

Important

Standalone Linux port: This repository contains exclusively the Linux port of Blitztext – a standalone Python 3/PyQt6 implementation optimized for Kubuntu/Ubuntu running KDE Plasma with Wayland. For the original macOS version, please visit the official main repository.


Features

  • Multilingual interface (EN/DE): Switch the app interface between German and English under Settings → General → "Interface language" (takes effect after restarting the app).
  • Compose window: Type or paste any text, select a workflow and writing style, and let the AI rewrite it — no microphone needed. Includes tone selector, custom preset, variant history, and signature support.
  • OpenRouter & custom LLM endpoints: Use OpenRouter or any OpenAI-compatible API as an alternative to OpenAI for all AI workflows.
  • Audio export: Save read-aloud output as an audio file directly from the Read Aloud window.
  • Custom names / terms: Extend the AI's vocabulary with your own terms, names, or technical words for perfect transcriptions.
  • Global hotkeys: Record from anywhere in the system at any time.
  • Auto-paste: Detects speech and pastes it right where your cursor is.
  • LLM-powered workflows: Let the AI rephrase your sentences professionally, filter them emotionally, or enrich them with fitting emojis.
  • Local processing: Optionally 100% offline for full privacy.

Installation

Quick install (recommended)

The easiest way to set up Blitztext on your system:

git clone https://github.com/TimInTech/blitztext-linux.git
cd blitztext-linux
bash scripts/install.sh

What does the script do? It is idempotent (safe to run repeatedly) and handles everything fully automatically:

  1. Checks your system (Ubuntu/Debian) & Python version.
  2. Installs missing system packages (incl. pipx).
  3. Sets up a .venv environment and installs openai-whisper/faster-whisper.
  4. Prepares ydotool.service and the systemd user service.

After installation

  1. Restart required (or log out/in) so the input group becomes active. Then verify:
    bash scripts/verify.sh
  2. Test manually:
    ./run.sh
    (Does the tray icon appear and do the hotkeys respond? Then everything went smoothly!)
  3. Enable autostart:
    systemctl --user start blitztext-linux
Disable autostart again
systemctl --user stop blitztext-linux
systemctl --user disable blitztext-linux
Manual installation (diagnostics / experts)

In case you want to debug specifically instead of using scripts/install.sh:

1. System packages (apt)

sudo apt install pulseaudio-utils wl-clipboard xclip ydotool ffmpeg python3-venv python3-evdev build-essential python3-dev socat pipx
Package Purpose
pulseaudio-utils parec for audio recording via PulseAudio/PipeWire
wl-clipboard / xclip Clipboard under Wayland (wl-copy) or X11 fallback
ydotool (≥ 1.0) Simulates Ctrl+V for automatic pasting (auto-paste). From version 1.0 onward, raw keycodes are used. Ubuntu 25.10/26.04 ship ydotool ≥ 1.0 (1.0.4) directly via apt. Ubuntu 24.04 and 22.04 only ship 0.1.x via apt (e.g. 0.1.8), which does not support keycodes and therefore has no auto-paste – build ydotool ≥ 1.0 from source there (see below). Auto-paste verified on 24.04, 25.10, and 26.04.
ffmpeg Audio conversions
python3-evdev Input device access for the system-wide hotkey daemon
socat Optional socket communication
pipx Isolated installation of Whisper engines

2. Grant evdev permissions

sudo usermod -aG input $USER

3. Virtual environment & Python packages

python3 -m venv .venv
source .venv/bin/activate
pip install PyQt6 evdev openai pytest openai-whisper faster-whisper

4. Whisper engine as an alternative via pipx If you want to install openai-whisper decoupled from the venv (avoids version conflicts on newer Ubuntu setups due to Python 3.11):

pipx install --python "$(command -v python3.11)" openai-whisper
pipx inject openai-whisper faster-whisper   # optional, for accelerated execution

5. Check ydotool

systemctl --user start ydotool.service

If apt only provides ydotool 0.1.x (Ubuntu 24.04/22.04), build ydotool ≥ 1.0 from source:

sudo apt install cmake build-essential scdoc git
git clone --depth 1 --branch v1.0.4 https://github.com/ReimuNotMoe/ydotool.git
cd ydotool && cmake -B build -DCMAKE_BUILD_TYPE=Release && make -C build && sudo make -C build install
systemctl --user enable --now ydotool.service   # uses /usr/local/bin/ydotoold

6. Start the application

./run.sh

The 5 workflows and hotkeys

Blitztext registers global hotkeys via evdev. With these combinations you have full control:

Workflow Hotkey LLM? Description
Blitztext Meta + H Default: records, transcribes, and pastes the text.
Blitztext Local Meta + Shift + H Forces a pure offline transcription.
Blitztext+ Meta + Shift + T Rephrases your recording professionally via LLM.
Blitztext $%&! Meta + Shift + D Emotional release: turns frustration into a matter-of-fact message.
Blitztext :) Meta + Shift + E Enriches your message with fitting emojis.

Note

LLM workflows (Blitztext+, Blitztext $%&!, Blitztext :)) require a valid API key. The easiest way is to place it in ~/.config/blitztext-linux/secrets.env using the format NAME=VALUE (e.g. OPENAI_API_KEY set to your key). ./run.sh and the systemd service load this file automatically. Without a key, these functions are disabled in the menu and via hotkeys, or result in an error message.


AI workflows

The AI workflows help with phrasing, tone, and emojis. You'll find the relevant settings under Settings → AI Workflows:

AI workflow settings

LLM providers

Blitztext supports three provider modes, selectable under Settings → AI Workflows → "LLM provider":

Provider When to use
OpenAI (default) Standard OpenAI API with gpt-4o-mini or any other model.
OpenRouter Access hundreds of models via a single API key (OPENROUTER_API_KEY). Base URL: https://openrouter.ai/api/v1.
Custom endpoint Any OpenAI-compatible API — set "Base URL" and "LLM model" to match your provider.

For OpenRouter, set base_url to https://openrouter.ai/api/v1 and choose your model (e.g. openai/gpt-4o). The API key environment variable name is configured under "API key environment".

Writing-style presets

For the Blitztext+ workflow (text improver) there are ready-made writing-style presets that you select under Settings → AI Workflows → "Writing-style preset" or directly in the Compose window:

Preset Effect
Standard (improve text) Previous behavior – cleanly formatted text, the selected tone applies.
Email – formal Polite email in the formal form with a clear structure.
Email – casual Friendly email in the informal form.
Bullet points Structures the content into concise bullet points.
Summary Concise, factual summary of the key statements.
Personal (informal) Clear text in a personal, informal tone.
Polite (formal) Clear text in a polite, formal tone.
Short & precise As concise as possible, without filler words and repetitions.
Custom preset… A free-form system prompt you define yourself under Settings → General → "Custom preset (Compose)".

With Standard, the configured tone (casual / neutral / professional) is additionally applied. Every other preset brings its own writing style and overrides the tone setting. Custom names/terms are preserved in all presets.


Compose window

The Compose window (✍ Compose… in the tray menu) lets you rewrite any text using the AI — without recording your voice. It is ideal for editing existing drafts, emails, or notes.


Compose window

How to open: Click the tray icon → ✍ Compose…

What you can do in the Compose window:

Element Description
Draft (left pane) Type or paste the text you want to rewrite.
Workflow Choose between Blitztext+ (text improver), Blitztext $%&! (steam release), or Blitztext :) (emojis).
Writing-style preset Select a preset or Custom preset… for a fully custom system prompt.
Tone Choose casual, neutral, or professional. Active only when Standard preset + Blitztext+ is selected; grayed out for all other presets (a tooltip explains why).
Improve Sends your draft to the AI and shows the result in the right pane.
Variant history The last 10 generated results within the current session are kept as a scrollable list — click any entry to restore it.
Signature Appends your saved signature (configured under Settings → General). Automatically replaces common AI-generated placeholders such as [Your Name], [Ihr Name], [Vorname Nachname], [Signature], and similar — so no stray placeholder is ever left behind.
Copy Copies the result to the clipboard.
Insert & Close Pastes the result directly into the active application and closes the window.

Note

The signature and custom preset text are configured under Settings → General. Set "Signature for Compose window" and toggle "Automatically append after generation" if you want the signature added to every result.


Tray icon and context menu

The microphone in the system tray is your indicator of the current state:



Green (IDLE)
Ready — waiting for your action.


Red (RECORDING)
Recording is actively running.


Orange (TRANSCRIBING)
Magic in progress (transcription / AI rephrasing).


Gray (ERROR)
Oops, something went wrong.

The tray context menu gives you quick access to all workflows, the compose window, writing-style presets, dictation mode, history, and settings:


Tray context menu

Note

If no tray area is available in the desktop environment, the icon falls back to the system theme audio-input-microphone; the color coding may then not apply.


Main window

The main window is your graphical control center — useful when hotkeys are blocked or you prefer mouse control:


Main window

  • Workflow dropdown: Select from all 5 recording modes.
  • Writing-style preset: Visible when Blitztext+ is selected — pick your preset directly in the main window. Changes sync to the tray instantly.
  • Start/Stop button: Click to begin or end a recording.
  • Discard: Cancels the current recording without transcription.
  • Dictation / History: Quick access to dictation mode and the transcript history.
  • Read aloud / Settings: Open the read-aloud window or the settings dialog.

The window opens at startup and via the tray entry Show window or a click on the tray icon. Closing only hides the window — the app keeps running in the tray.


Dictation, history, and read-aloud

In addition to the workflows, the tool offers three convenience functions:


History Read aloud

Menu item Description
Dictation mode Toggle. When active, all transcripts are collected as dictation entries and each saved as a Markdown file. The history then shows a Merge button that combines all entries and copies them to the clipboard.
History… Opens a window with the most recent transcripts. Per entry: copy to clipboard or delete.
Read aloud… Reads any text aloud to you — locally via Piper TTS (default) or optionally via OpenAI Cloud TTS (including provider, voice, and model selection). Use the Export button to save the audio as a file.

Note

Dictation notes are written exclusively into a folder inside the home directory (protection against path traversal), with permissions 0o600.

Important

Piper TTS must be installed for the read-aloud function (as well as voices):

.venv/bin/pip install piper-tts
# Place voices (.onnx + .onnx.json) into ~/.local/share/piper-voices/

If Piper or a voice is missing, the read-aloud window shows an installation hint; all other functions remain usable. Optional desktop notifications use notify-send (package libnotify-bin).

Note

OpenAI Cloud TTS is an optional alternative to Piper. Requirements: the openai package (.venv/bin/pip install openai) and a valid key in the environment variable OPENAI_API_KEY (see secrets.env below). When first switching to the "OpenAI Cloud" provider, the read-aloud window asks for confirmation once, because the entered text is sent to OpenAI's servers for synthesis. Piper remains the default and works entirely locally.


Configuration

Everything is stored locally and securely under ~/.config/blitztext-linux/config.json. The OpenAI key is not stored in this file but read from an environment variable. The configuration file can be opened directly from the settings: Settings → General → "Open configuration file".

The settings dialog has three tabs:

Settings: Speech Recognition
Speech Recognition — Whisper model, backend, language, hotkey mode, and recording key.

Settings: AI Workflows
AI Workflows — LLM provider, API key, base URL, model, tone, and writing-style preset.

Settings: General
General — Auto-Paste, dictation folder, history size, interface language, and signature.

Important

The configuration file is automatically saved with restrictive file permissions (0o600 / chmod 600). The real OpenAI key instead lives in ~/.config/blitztext-linux/secrets.env or is provided as an environment variable.

Example configuration & field explanation
{
  "model": "base",
  "language": "de",
  "ui_language": "en",
  "backend": "openai-whisper",
  "hotkey_mode": "toggle",
  "openai_api_key_env": "OPENAI_API_KEY",
  "autopaste": true,
  "audio_device": "@DEFAULT_SOURCE@",
  "llm_provider": "openai",
  "base_url": "",
  "llm_model": "gpt-4o-mini",
  "compose_signature": "",
  "compose_signature_auto_append": false,
  "compose_custom_preset_text": "",
  "workflows": {
    "text_improver_tone": "neutral",
    "writing_preset": "standard",
    "emoji_density": "medium",
    "dampf_system_prompt": ""
  }
}
  • model: Whisper model size (tiny, base, small, medium, large, large-v2, large-v3, large-v3-turbo). Default: base.
  • language: Transcription language (de, en) or auto.
  • ui_language: Language of the app interface (de or en). Default: de. Changes take effect after a restart.
  • backend: openai-whisper or faster-whisper.
  • hotkey_mode:
    • toggle: press once to start, press again to stop.
    • hold: recording runs as long as the hotkey is held.
  • openai_api_key_env: Name of the environment variable for the API key. Default: OPENAI_API_KEY. For OpenRouter use OPENROUTER_API_KEY.
  • llm_provider: openai (default), openrouter, or custom.
  • base_url: Custom API base URL. Empty = OpenAI default. For OpenRouter: https://openrouter.ai/api/v1.
  • llm_model: Model name at the provider, e.g. gpt-4o-mini (OpenAI) or openai/gpt-4o (OpenRouter).
  • autopaste: Pastes via ydotool.
  • audio_device: Name of the audio source.
  • compose_signature: Signature text appended in the Compose window.
  • compose_signature_auto_append: Auto-append signature after every generation in Compose (true/false).
  • compose_custom_preset_text: Free-form system prompt for the "Custom preset…" option in the Compose window.
  • tts_provider: TTS provider for "Read aloud" — piper (local, default) or openai (cloud).
  • tts_openai_model / tts_openai_voice: Model and voice for OpenAI Cloud TTS (default: gpt-4o-mini-tts, nova).
  • tts_openai_consent: true once the one-time privacy confirmation for Cloud TTS has been granted.
  • workflows: Fine-tuning of tonality (text_improver_tone), writing-style preset (writing_preset), emojis (emoji_density), and the steam-release prompt (dampf_system_prompt).

Development and tests

We love stability! Run the tests locally:

pytest

With WHISPER_GUI_TESTS=1 QT_QPA_PLATFORM=offscreen pytest, the GUI tests (main window, compose window) run additionally.

Directory overview
.
├── app/
│   ├── __init__.py
│   ├── audio_recorder.py   # PulseAudio/PipeWire recording via parec
│   ├── blitztext_linux.py  # PyQt6 main application (system tray)
│   ├── compose_window.py   # Compose window for text-only AI rewriting
│   ├── config.py           # Configuration manager
│   ├── history_panel.py    # Transcript history panel
│   ├── hotkey_service.py   # evdev-based hotkey daemon
│   ├── i18n.py             # Interface translations (DE/EN)
│   ├── llm_service.py      # OpenAI / OpenRouter / custom endpoint interface
│   ├── main_window.py      # Main application window
│   ├── paste_service.py    # Wayland clipboard integration
│   ├── transcribe.py       # Whisper transcription
│   ├── tts_window.py       # Read Aloud window with audio export
│   ├── workflows.py        # Workflow definitions
│   └── writing_presets.py  # Writing-style preset definitions
├── tests/                  # Test suite
└── README.md               # This document (German version: README.de.md)

Important notes

  • Linux exclusive: For Linux systems only.
  • Wayland focus: Developed for Wayland (wl-clipboard, ydotool).
  • Privacy: Local workflows stay 100% on your machine. OpenAI or OpenRouter is only contacted when needed for LLM or Cloud TTS tasks.
  • Security (evdev & input group): The tool reads input globally via /dev/input/event*. At the system level, this means all of the user's processes could read along with input (a trade-off under Wayland without XDG GlobalShortcuts). Only use Blitztext in environments you trust!
  • Developer note: This project was designed with the support of artificial intelligence (AI-assisted). Architecture, code, and tests were reviewed manually and verified locally for function/security.

Legal / Imprint & privacy (original project)

This project is a Linux port of the macOS application "Blitztext". For fairness and correct attribution, we refer to the legal information of the original project:

The original project is an experimental, non-commercial open-source project under the MIT license. The associated website (blitztext.de) is operated by Blackboat Internet GmbH:


Made with ❤️ (and a little AI help).

About

Standalone Linux port of Blitztext for speech-to-text workflows, AI-assisted rewriting, global hotkeys, tray usage, and clipboard/auto-paste integration.

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors