Real-time live transcription for Nextcloud Talk video calls using Kyutai's streaming speech-to-text model deployed on Modal.com.
- Low latency: ~0.5 second first-token latency with streaming transcription
- GPU-accelerated: Runs on Modal.com's GPU infrastructure
- Automatic scaling: Scales from zero with Modal's serverless architecture
- Cost-effective: Pay only for what you use with per-second billing
- Multi-language: Supports English and French
This app must use the app ID live_transcription because Nextcloud Talk is hardcoded to look for an ExApp with exactly that ID when enabling the CC (closed captions) button. We would prefer to use a unique, non-conflicting name like live_transcription, but Talk's LiveTranscriptionService specifically queries for getExApp('live_transcription').
This means:
- This app cannot be installed alongside Nextcloud's official live_transcription app
- You must choose one or the other as your live transcription provider
- If you have the official app installed, unregister it first before installing this one
-
Nextcloud 30+ with the following:
- Talk app (spreed) 18+
- High-Performance Backend (HPB) configured
- AppAPI app installed
-
Modal.com account with:
- Kyutai STT service deployed (see kyutai_modal)
- Proxy authentication token created
First, deploy the Kyutai STT service on Modal. See the kyutai_modal repository for instructions.
git clone https://github.com/codemyriad/kyutai_modal.git
cd kyutai_modal
uvx modal deploy src/stt/modal_app.py- Go to your Modal dashboard
- Note your workspace name from the URL (e.g.,
user-myworkspace) - Go to Settings → Proxy Auth Tokens → Create Token
- Save the generated key and secret
Nextcloud AIO comes with a pre-configured Docker daemon called docker_aio. Register the ExApp from inside the Nextcloud container:
docker exec --user www-data -it nextcloud-aio-nextcloud php occ app_api:app:register \
live_transcription docker_aio \
--info-xml https://raw.githubusercontent.com/codemyriad/live_transcription/main/appinfo/info.xml \
--env "LT_HPB_URL=${NEXTCLOUD_URL}/standalone-signaling/spreed" \
--env "LT_INTERNAL_SECRET=your-hpb-internal-secret" \
--env "MODAL_WORKSPACE=your-modal-workspace" \
--env "MODAL_KEY=your-modal-key" \
--env "MODAL_SECRET=your-modal-secret" \
--wait-finishFirst, check if you already have a Docker deploy daemon registered:
occ app_api:daemon:listIf you see a daemon with type docker-install, note its name and use it in the command below.
If no Docker daemon is configured, register one first:
occ app_api:daemon:register docker_local "Docker Local" \
docker-install http /var/run/docker.sock http://localhostThen register the ExApp (replace docker_local with your daemon name if different):
occ app_api:app:register live_transcription docker_local \
--info-xml https://raw.githubusercontent.com/codemyriad/live_transcription/main/appinfo/info.xml \
--env "LT_HPB_URL=${NEXTCLOUD_URL}/standalone-signaling/spreed" \
--env "LT_INTERNAL_SECRET=your-hpb-internal-secret" \
--env "MODAL_WORKSPACE=your-modal-workspace" \
--env "MODAL_KEY=your-modal-key" \
--env "MODAL_SECRET=your-modal-secret" \
--wait-finishOnce this app is published to the Nextcloud App Store, you'll be able to install it via Settings → Apps → External Apps. However, you'll still need to configure the environment variables (Modal credentials, HPB settings) via the AppAPI settings page after installation.
Note: This app is not yet published to the Nextcloud App Store. For now, use the command-line installation above
| Variable | Description | Example |
|---|---|---|
LT_HPB_URL |
WebSocket URL to HPB signaling server | wss://nextcloud.example.com/standalone-signaling/spreed |
LT_INTERNAL_SECRET |
HPB internal secret for authentication | your-24-char-secret |
MODAL_WORKSPACE |
Your Modal workspace name | user-myworkspace |
MODAL_KEY |
Modal proxy authentication key | key_... |
MODAL_SECRET |
Modal proxy authentication secret | secret_... |
| Variable | Description | Default |
|---|---|---|
APP_ID |
Application identifier (must be live_transcription) |
live_transcription |
APP_VERSION |
Application version | 1.0.0 |
APP_PORT |
Port to listen on | 23000 |
SKIP_CERT_VERIFY |
Skip SSL certificate verification | false |
Once installed, the transcription feature will be available in Nextcloud Talk:
- Join a video call in Nextcloud Talk
- Click on the CC (closed captions) button in the call controls
- Select your preferred language
- Transcriptions will appear in real-time as participants speak
┌─────────────────────────────────────────────────────────────────┐
│ Nextcloud Talk UI │
│ (Enable transcription) │
└────────────────────────┬────────────────────────────────────────┘
│ HTTP API
▼
┌─────────────────────────────────────────────────────────────────┐
│ Kyutai Transcription ExApp │
│ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ FastAPI Application │ │
│ │ /api/v1/call/transcribe /api/v1/call/set-language │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │ │
│ ┌─────────────────────▼─────────────────────────────────────┐ │
│ │ SpreedClient │ │
│ │ - Connects to HPB via WebSocket │ │
│ │ - Receives audio via WebRTC │ │
│ │ - Manages peer connections │ │
│ └─────────────────────┬─────────────────────────────────────┘ │
│ │ Audio │
│ ┌─────────────────────▼─────────────────────────────────────┐ │
│ │ ModalTranscriber │ │
│ │ - Resamples audio (48kHz → 24kHz) │ │
│ │ - Encodes to Opus │ │
│ │ - Sends to Modal via WebSocket │ │
│ └─────────────────────┬─────────────────────────────────────┘ │
└─────────────────────────┼───────────────────────────────────────┘
│ wss://
▼
┌─────────────────────────────────────────────────────────────────┐
│ Modal.com (GPU Cloud) │
│ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Kyutai STT Service │ │
│ │ - Decodes Opus audio │ │
│ │ - Runs Moshi streaming inference │ │
│ │ - Returns transcription tokens │ │
│ └─────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
| Endpoint | Method | Description |
|---|---|---|
/health |
GET | Health check |
/enabled |
GET | Check if app is configured |
/capabilities |
GET | Get app capabilities |
/api/v1/languages |
GET | Get supported languages |
/api/v1/call/transcribe |
POST | Start/stop transcription |
/api/v1/call/set-language |
POST | Change transcription language |
/api/v1/call/leave |
POST | Leave a call |
/api/v1/status |
GET | Get service status |
- Clone the repository:
git clone https://github.com/codemyriad/live_transcription.git
cd nc_kyutai_live_transcriptions- Create a virtual environment:
python -m venv venv
source venv/bin/activate
pip install -e ".[dev]"- Set environment variables:
export LT_HPB_URL=$NEXTCLOUD_URL/standalone-signaling/spreed
export LT_INTERNAL_SECRET=your-secret
export MODAL_WORKSPACE=your-workspace
export MODAL_KEY=your-key
export MODAL_SECRET=your-secret- Run the application:
cd ex_app/lib
python -m uvicorn main:app --reload --port 23000Prefer using uv to isolate deps quickly:
UV_CACHE_DIR=/tmp/uv-cache uv run pytest -vIf you already have a venv active:
pytest tests/ -vdocker build -t live_transcription:dev .- Check that HPB is properly configured and accessible
- Verify Modal credentials are correct
- Check the container logs:
docker logs nc_app_live_transcription - Verify the HPB internal secret matches
- Ensure
LT_HPB_URLis correct (should end with/spreed) - Check that
LT_INTERNAL_SECRETmatches the HPB configuration - If using self-signed certificates, set
SKIP_CERT_VERIFY=true
- Ensure all three Modal environment variables are set:
MODAL_WORKSPACEMODAL_KEYMODAL_SECRET
- Verify the Kyutai STT service is deployed on Modal
- Check Modal GPU selection (A10G or A100 recommended)
- Ensure good network connectivity to Modal
- Monitor Modal logs for any issues
AGPL-3.0-or-later