Skip to content

Add live transcription preview#571

Merged
basnijholt merged 11 commits into
mainfrom
live-transcription-preview
Jun 5, 2026
Merged

Add live transcription preview#571
basnijholt merged 11 commits into
mainfrom
live-transcription-preview

Conversation

@basnijholt
Copy link
Copy Markdown
Owner

Summary

  • Add rolling live transcription previews for Wyoming ASR with JSONL and terminal output support.
  • Wire macOS menu bar app to display live preview text above the voice-level overlay while recording.
  • Add regression coverage for CLI preview output, app preview parsing, command wiring, and app self-test path.

Test Plan

  • uv run ruff check agent_cli/agents/transcribe.py agent_cli/services/asr.py agent_cli/opts.py tests/test_asr.py tests/agents/test_transcribe.py tests/agents/test_transcribe_recovery.py
  • uv run pytest tests/test_asr.py tests/agents/test_transcribe.py tests/agents/test_transcribe_recovery.py tests/test_docs_gen.py -q
  • swift build in macos/AgentCLI
  • swift run AgentCLI --agentcli-live-preview-self-test in macos/AgentCLI
  • Manual CLI smoke: uv run agent-cli transcribe --live-preview-stdout

@basnijholt basnijholt force-pushed the live-transcription-preview branch from 50bd008 to 7f452c1 Compare June 4, 2026 21:58
@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented Jun 4, 2026

Greptile Summary

This PR adds rolling live transcription previews for Wyoming ASR: the Python side periodically re-transcribes a sliding audio window and writes JSONL events to a log file (and/or the terminal), while the macOS app's new LiveTranscriptionPreview singleton incrementally reads that log and publishes preview text above the voice-level overlay.

  • Python layer (asr.py, transcribe.py, opts.py): adds LivePreviewConfig, LivePreviewStreamer with a cancel-safe loop, and four new CLI flags. live_preview_console_active now correctly gates Rich live-status suppression on both the Wyoming provider and non-file-transcription checks, fixing the previously reported regression where non-Wyoming users would lose the Rich status.
  • macOS layer: panel sizing is now dynamic (compact vs. preview), LiveTranscriptionPreview reads the JSONL file incrementally via FileHandle offset tracking, and a new self-test validates the full argument-wiring and parsing path.
  • Tests: cover JSONL write/read, unique-partial deduplication, stale-partial-after-stop guard, the cancel-race scenario, and Rich live-status suppression logic for all provider/file-path combinations.

Confidence Score: 5/5

Safe to merge; the feature is well-scoped to Wyoming ASR live recording, the cancel-race ordering is correct, and the previously reported Rich live-status suppression for non-Wyoming providers is now fixed.

The Python cancel-ordering (request_stop before task.cancel + await) correctly prevents stale partials from landing after the final event, confirmed by the dedicated race-condition test. The live_preview_console_active gate now includes both the Wyoming provider check and the file-transcription check, fixing the issue raised in an earlier review pass. The two P2 observations are both benign given current usage patterns.

No files require special attention; the incremental FileHandle-based reading in LiveTranscriptionPreview.swift and the cancel-safe run loop in asr.py are the most complex new paths and both have direct test coverage.

Important Files Changed

Filename Overview
agent_cli/services/asr.py Adds LivePreviewConfig dataclass and LivePreviewStreamer with a periodic emit loop; wires live_preview_callback into _send_audio; adds finally-block cleanup in _transcribe_live_audio_wyoming. The race-condition ordering (request_stop before cancel+await) matches the test coverage, and _transcribe_recorded_audio_wyoming returns str (never None) on failure, so emit_partial is safe.
agent_cli/agents/transcribe.py Wires four new live-preview parameters through _async_main and transcribe; live_preview_console_active now correctly gates Rich live status on both the Wyoming provider check and the file-transcription check, addressing the previous concern about non-Wyoming providers losing the Rich status.
macos/AgentCLI/Sources/AgentCLI/LiveTranscriptionPreview.swift New singleton that incrementally reads the JSONL log with FileHandle (offset-tracked), parses events, and publishes text via @published. The pendingLine double-parse edge case is low-risk given the Python writer always appends newlines.
macos/AgentCLI/Sources/AgentCLI/RecordingIndicatorController.swift begin(for:) checks both supportsLivePreviewOverlay and isLivePreviewOverlayEnabled before calling start(), but end(for:) calls stop() on supportsLivePreviewOverlay alone — harmless today since stop() is idempotent, but asymmetric.
macos/AgentCLI/Sources/AgentCLI/VoiceLevelOverlay.swift Panel sizing is now dynamic (compactPanelSize vs previewPanelSize) instead of always allocating the preview-space size; panel content is re-hosted only when showsPreviewSpace changes.
tests/test_asr.py Comprehensive new test coverage: JSONL write, console print, unique-partial deduplication, stop writes final, stale-partial-after-stop guard, and the cancel race all covered.

Sequence Diagram

sequenceDiagram
    participant User
    participant RecordingIndicatorController
    participant LiveTranscriptionPreview
    participant VoiceLevelOverlayController
    participant CLI as agent-cli process
    participant LivePreviewStreamer
    participant WyomingASR

    User->>RecordingIndicatorController: begin(for: toggleTranscription)
    RecordingIndicatorController->>LiveTranscriptionPreview: start(logURL)
    Note over LiveTranscriptionPreview: Truncates JSONL, starts 250ms timer
    RecordingIndicatorController->>VoiceLevelOverlayController: show(showsPreviewSpace: true)
    RecordingIndicatorController->>CLI: spawn with --live-preview-log --live-preview-interval 1 --live-preview-window 10

    loop Every interval_seconds (1s)
        CLI->>LivePreviewStreamer: emit_partial()
        LivePreviewStreamer->>WyomingASR: _transcribe_recorded_audio_wyoming(rolling window)
        WyomingASR-->>LivePreviewStreamer: partial text
        LivePreviewStreamer->>LivePreviewStreamer: "_publish_event(type=partial)"
        LivePreviewStreamer-->>CLI: append to JSONL log
    end

    loop Every 250ms (Swift timer)
        LiveTranscriptionPreview->>LiveTranscriptionPreview: readNewContents(offset-tracked)
        LiveTranscriptionPreview->>LiveTranscriptionPreview: parse last valid JSON event
        LiveTranscriptionPreview-->>VoiceLevelOverlayController: "@Published text update"
    end

    User->>RecordingIndicatorController: end(for: toggleTranscription)
    CLI->>LivePreviewStreamer: finally: request_stop() then cancel task then stop(final_text)
    LivePreviewStreamer-->>CLI: append final event to JSONL
    RecordingIndicatorController->>VoiceLevelOverlayController: hide()
    RecordingIndicatorController->>LiveTranscriptionPreview: stop()
    Note over LiveTranscriptionPreview: Invalidates timer, clears state
Loading

Reviews (6): Last reviewed commit: "Signal live preview stop before cancella..." | Re-trigger Greptile

Comment thread macos/AgentCLI/Sources/AgentCLI/VoiceLevelOverlay.swift Outdated
Comment thread macos/AgentCLI/Sources/AgentCLI/LiveTranscriptionPreview.swift
Comment thread macos/AgentCLI/Sources/AgentCLI/RecordingIndicatorController.swift Outdated
Comment thread agent_cli/services/asr.py
@basnijholt basnijholt force-pushed the live-transcription-preview branch 3 times, most recently from 64faa0f to d6745a9 Compare June 4, 2026 22:33
@basnijholt basnijholt force-pushed the live-transcription-preview branch from a39c536 to 5b2893b Compare June 4, 2026 22:58
@basnijholt basnijholt merged commit 2aa8cff into main Jun 5, 2026
12 checks passed
@basnijholt basnijholt deleted the live-transcription-preview branch June 5, 2026 22:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant