Skip to content

fix(cli): improve TTS and model download reliability#1187

Open
miguel-heygen wants to merge 1 commit into
mainfrom
worktree-fix+tts-reliability
Open

fix(cli): improve TTS and model download reliability#1187
miguel-heygen wants to merge 1 commit into
mainfrom
worktree-fix+tts-reliability

Conversation

@miguel-heygen
Copy link
Copy Markdown
Collaborator

@miguel-heygen miguel-heygen commented Jun 4, 2026

Summary

  • Add retry with exponential backoff (3 retries, 1s/2s/4s) to the shared download utility, with socket timeouts (30s), response timeouts (60s), redirect loop protection, and specific HTTP 403/429 rate-limit error messages
  • Add file-level locking (.lock files) to TTS model and voice downloads to prevent concurrent processes from racing on the same temp file
  • Increase hasPythonPackage timeout from 10s to 30s to accommodate ONNX runtime cold-start
  • Set maxBuffer to 10 MB on the synthesis subprocess to prevent truncation from verbose ONNX warnings

Problem

The TTS command fails frequently at scale due to four compounding issues:

  1. GitHub rate limiting — the 311 MB model downloads from GitHub Releases have no retry logic. HTTP 403 responses are terminal.
  2. Download races — concurrent agent-spawned TTS calls write to the same .tmp file, causing corruption.
  3. False negative package checks — ONNX runtime cold import exceeds the 10s timeout on constrained machines, making hasPythonPackage("kokoro_onnx") report false even when it's installed.
  4. maxBuffer overflow — ONNX prints verbose warnings that exceed Node's 1 MB default, killing the subprocess.

Reproduction

# 1. Rate-limit failure (download.ts has no retry):
# Clear the model cache to force a fresh download:
rm -rf ~/.cache/hyperframes/tts/models/

# Run tts — if GitHub rate-limits the 311 MB download, it fails immediately
# with no retry. In agent environments with 3+ concurrent tts calls,
# this is near-guaranteed at scale.
hyperframes tts "Hello world" -o /tmp/test.wav
# Error: Download failed: HTTP 403

# With fix: retries 3 times with 1s/2s/4s backoff, and prints:
# "Download failed: HTTP 403 (rate limited). GitHub throttles
#  unauthenticated release downloads. Retry in a moment."

# 2. Concurrent download race (no file locking):
# Run two tts commands simultaneously — both see model as missing,
# both start downloading to the same .tmp file:
rm -rf ~/.cache/hyperframes/tts/models/
hyperframes tts "Hello" -o /tmp/a.wav &
hyperframes tts "World" -o /tmp/b.wav &
wait
# One or both fail with corrupted model file

# With fix: second process sees .lock file, waits for first to finish

# 3. ONNX cold-start false negative (10s timeout too short):
# On a cold machine or constrained sandbox:
pip install kokoro-onnx  # installed but slow to import
hyperframes tts "test" -o /tmp/test.wav
# Error: The kokoro-onnx package is not installed
# (false negative — import timed out at 10s)

# With fix: timeout increased to 30s

Scope

The download utility (packages/cli/src/utils/download.ts) is shared by TTS, Whisper, and background-removal commands — all three benefit from the retry and timeout improvements.

Test plan

  • Build passes (bun run build)
  • All pre-commit hooks pass (lint, format, typecheck, fallow)
  • Download retry logic handles HTTP 403 with clear rate-limit message
  • File locking prevents concurrent download races
  • hyperframes tts "Hello" -o test.wav works end-to-end
  • Verify TTS failure rate drops after deploy

Three changes to make TTS (and all model downloads) more reliable:

1. download.ts: add retry with exponential backoff (3 retries), socket
   and response timeouts, redirect loop protection, and specific HTTP
   403/429 (rate limit) error messages. This fixes the primary failure
   mode: GitHub Releases rate-limiting model downloads at scale.

2. tts/manager.ts: add file-level locking (.lock files) for model and
   voice downloads to prevent concurrent processes from racing on the
   same .tmp file. The lock is stale-checked after 5 minutes.

3. tts/synthesize.ts: increase hasPythonPackage timeout from 10s to
   30s (ONNX runtime cold import exceeds 10s on constrained machines),
   and set maxBuffer to 10 MB (ONNX prints verbose warnings that can
   exceed the 1 MB default, killing the subprocess).

The download retry also benefits whisper (transcribe) and
background-removal commands which share the same download utility.

PostHog data: TTS failure rate hit 61% on June 2 when usage spiked 3x.
const stat = statSync(lockPath);
if (Date.now() - stat.mtimeMs > LOCK_STALE_MS) {
unlinkSync(lockPath);
writeFileSync(lockPath, String(process.pid), { flag: "wx" });
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants