Performance: fix hot-path latency, UI churn, and memory across dictation and meetings by r3dbars · Pull Request #1350 · r3dbars/transcripted

r3dbars · 2026-07-02T02:50:53Z

Why

A full performance review of the app (dictation hot path, meeting pipeline, UI/startup, observability) surfaced concrete latency, churn, and memory problems: dictation blocked on model load before the mic opened, stop-to-paste work sat behind 200ms polling, meeting transcription round-tripped the main actor per segment with redundant @Published fires, audio-level publishing invalidated the 5k-line settings view at buffer rate, the live sidecar did O(n²) file I/O, analytics did synchronous disk JSON on the main thread per event, and long meetings held ~1GB+ of samples in RAM. This PR fixes that set, one commit per fix (plus a few WIP snapshots from incremental pushes).

Product Impact

Affects: dictation / meetings
Lane: dictation reliability / meeting reliability
Why this matters: faster hotkey-to-recording and stop-to-paste, less CPU/memory pressure during and after long meetings, less UI jank while recording, and lower system-wide typing latency from the event tap.

What changed

Dictation records while the voice model loads: when model files are already local, the mic opens immediately and the CoreML load runs concurrently; the stop path (which already waited for the model) now fails fast on load failure, kicks the deduped init when nothing is loading, and shows post-stop copy. Model waits join the init task instead of polling at 200ms (poll survives only for download-progress display), bounded by the same 120s budget.
Dictation buffer sizing: audio buffers reserve for the 5-min session cap + headroom (360s ≈ 69MB at 48kHz) instead of 1800s (≈345MB retained for process lifetime); pendingSamples reserves up front to avoid tap-thread growth reallocations.
Device-scan caching: analytics route context is served from a cached selection refreshed on start/prewarm/route-change instead of a blocking CoreAudio enumeration on the main actor ~6× per dictation stop. Recording paths still do live lookups.
Meeting segment path: per-segment initialize() skipped when the model is already loaded (pipeline initializes once up front); refreshModelDownloadState() no longer reassigns the @Published state when unchanged, killing thousands of per-meeting objectWillChange cascades into menubar/warmup/settings subscribers.
Pipeline memory: whole-meeting channel buffers (~460MB each for 2h) are released right after their verified last use instead of living for the whole diarize→transcribe→merge run.
Audio-level publishing: levels publish through a 150ms monotonic time-gate (~6.7Hz) instead of per mic/system buffer (~25/s combined), with the history arrays updated via a single assignment so each gated buffer emits exactly one objectWillChange; MeetingSessionController republishes recordingDuration on whole-second boundaries; the meeting overlay skips full view pushes while the displayed mm:ss is unchanged; the live drawer throttles transcript rebuilds to ~5Hz with latest: true.
Live transcriber: buffer deep-copy deferred off the tap thread onto the channel's input queue; drawer feed ingest goes through one long-lived main-actor consumer instead of a task per partial hypothesis.
Live sidecar: workspace text mirrored in memory (was: full re-read of the growing transcript + preview per accepted entry — O(n²) over meeting length); preview rewrites amortized to 2s; on-disk formats byte-identical.
Analytics: enqueue is async with an in-memory buffer and debounced persistence (was: 3+ synchronous JSON file round-trips on the main thread per event); synchronous final persist on app termination; opt-out cancels pending writes; byte-cap no longer re-encodes the whole buffer per removal.
Event tap: hotkey bindings cached and rebuilt on .hotkeysDidChange instead of ~8 UserDefaults reads per system-wide keystroke on the main run loop.
Stats: refreshStats() runs its ~8 SQLite queries off the main actor and publishes back; streak scan uses a Set + cached formatter; 60s diagnostics heartbeat got 10s timer tolerance.
Speaker rename/merge: bounded frontmatter scan (64KB chunks, 1MB cap) pre-filters the library instead of reading every transcript fully; ambiguity falls back to the full read.
Custom dictionary: compiled regexes cached keyed on the entries value (was: re-sort + recompile per entry per dictation/meeting segment).
Dictation day files: O(1) FileHandle append instead of read-whole-file + atomic rewrite per dictation.

How I checked it

scripts/dev/agent-preflight.sh
python3 scripts/dev/check-build-source-lists.py (passed)
CI build-and-test (macOS) green on head 12bd72b. Two earlier failures were diagnosed from CI logs and fixed: double @Published emission per in-place history mutation (ed25fc6), and a signal-6 abort in AVAudioConverter from a bufferListNoCopy destination buffer, reverted to the proven allocate-then-copy path (12bd72b).
Hardware smokes + manual pass run by the maintainer before undrafting (per Maestro hold exit criteria)

New/updated tests: level publish gate, resampler round-trip, dictionary cache invalidation, analytics in-memory buffering, event-tap snapshot contract, speaker-rename frontmatter regression, constants pins.

Risk Review

Privacy / local-first behavior reviewed — no payload shape changes; analytics change is buffering-only
Storage path or migration impact reviewed — on-disk formats preserved byte-for-byte
Public-facing copy stays concrete — only new overlay copy is the post-stop "recording is captured…" loading text
Release/update impact reviewed — no version/appcast changes
Agent PR stayed draft until CI was green and the Maestro hold's exit criteria were met
No private transcripts, audio, tokens, personal paths, or customer data are included

Notes

Known behavior changes reviewers should weigh:

Day-file append is no longer atomic — a crash mid-append can truncate the trailing section (in-process writes still serialized).
With model files on disk, the dictation overlay goes straight to listening; model-loading UI appears only for true first-run downloads, or briefly after stop if the user beats the load.
Force-quit within the analytics 1s debounce window can drop unsent events (normal quit persists synchronously).
Meeting waveform meters update at ~6.7Hz instead of 12–24Hz; recordingDuration mirror can be up to ~0.9s stale for diagnostics readers (bucket-scale consumers only).
Speaker merge no longer rewrites db_id strings that appear only in body text (pinned by test).

Deliberately deferred (needs hardware/runtime validation or upstream work): FluidAudio 0.15.x / streaming ASR upgrade (docs/voices-model-upgrade-plan.md), promoting chunked decode into the live dictation path, Int16 capture WAV + direct-to-M4A playback encode, moving the event tap off the main run loop, nonisolated ASR inference entry (inference already suspends off-main via FluidAudio), and the resampler no-copy destination buffer (aborted inside AVAudioConverter on CI — needs a Mac to debug against; the allocate-then-copy path is back in place).

Agent handoff

🤖 Generated with Claude Code

https://claude.ai/code/session_01NvcdYAy2h7DXPfnMbwjcH1

appendSection read the whole day file and rewrote it atomically per dictation, so save cost grew with the day's transcript. Append via FileHandle instead, reading only the trailing two bytes to pick the blank-line separator. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01NvcdYAy2h7DXPfnMbwjcH1

The CGEventTap callback ran bindingProvider on every system-wide keyDown/keyUp/flagsChanged, doing 4 binding lookups (~8+ UserDefaults reads plus migration fallbacks) per event on the main run loop. Replace the closure with a snapshot rebuilt on .hotkeysDidChange, which already routes through reRegisterHotkeys() -> configurePhysicalShortcutDetector(). All in-app binding writes post that notification. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01NvcdYAy2h7DXPfnMbwjcH1

CustomDictionaryTextProcessor re-sorted the entries and compiled one NSRegularExpression per entry on every dictation and meeting segment. Cache the sorted+compiled form keyed on the parsed entries value (there is no change notification for the preference; entries change exactly when the raw text does), guarded by a lock for concurrent callers. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01NvcdYAy2h7DXPfnMbwjcH1

…entry Every accepted transcript entry re-read the growing live_transcript.md and preview.html from disk just to compare before rewriting, so sidecar I/O grew quadratically with meeting length. Mirror last-written text per file in memory (one disk read per file per session as fallback), render the preview from the in-memory transcript, and amortize per-entry preview rewrites to every 2s. Lifecycle transitions still force a fresh snapshot, and the preview page polls live_transcript.md directly, which is still written synchronously per entry. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01NvcdYAy2h7DXPfnMbwjcH1

StatsService.refreshStats() ran ~8 synchronous SQLite queries plus the streak scan on the main actor at settings-window open. Build the whole snapshot in a detached utility task (StatsDatabase already serializes on its own queue), publish back on the main actor, and serialize overlapping refreshes. Streak scan now uses a Set and a cached DateFormatter. Also give the 60s runtime-diagnostics heartbeat timer a 10s tolerance so the system can coalesce the wakeup; dirty-shutdown detection buckets heartbeat age far more coarsely. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01NvcdYAy2h7DXPfnMbwjcH1

Retroactive rename/merge read every transcript in the library fully just to check for a db_id that only ever lives in YAML frontmatter (written by TranscriptFormatter and writeFrontmatterSpeakerMetadata). Scan only up to the closing frontmatter delimiter in 64KB chunks (1MB cap) with byte-level search; any ambiguity (unopenable file, no leading ---, no close within cap) falls back to the historical full read so nothing is silently skipped. Merge no longer rewrites db_id strings that appear only in body text — pinned by a new test alongside a large-frontmatter regression test. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01NvcdYAy2h7DXPfnMbwjcH1

…transcriber, resampler) Intermediate snapshot of fixes still being authored; final reviewed versions land in follow-up commits. Not yet verified on macOS. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01NvcdYAy2h7DXPfnMbwjcH1

Covers the changes snapshotted in the previous commit: audio levels now publish through a 150ms monotonic time-gate instead of per mic/system buffer, the meeting overlay skips full view pushes while the displayed mm:ss is unchanged, the live drawer throttles transcript rebuilds to ~5Hz (latest: true), the live transcriber defers its buffer deep-copy off the tap thread and feeds the drawer through a single long-lived main-actor consumer, analytics buffering is in-memory with debounced persistence, and AudioResampler converts directly into the returned array's storage (no transient 2x peak). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01NvcdYAy2h7DXPfnMbwjcH1

Intermediate snapshot; final reviewed version lands in follow-up commits. Not yet verified on macOS. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01NvcdYAy2h7DXPfnMbwjcH1

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01NvcdYAy2h7DXPfnMbwjcH1

…ontroller) Intermediate snapshot; final reviewed version lands in follow-up commits. Not yet verified on macOS. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01NvcdYAy2h7DXPfnMbwjcH1

…olling Completes the start-path rework begun in the prior WIP snapshots: - When the selected model's files are already local (bundled, prefetched, or cached), open the microphone immediately and initialize the model concurrently. The stop path already waits for the model before transcribing; it now fails fast on .failed, kicks the deduped initialization when nothing is loading, and shows post-stop copy ("recording is captured...") instead of the pre-recording text. True needs-download and Whisper starts keep the blocking path. - Model waits join the engine's initialization task via joinModelInitialization() and resume the moment the load settles; the 200ms poll survives only for download-progress display. Both wait loops are bounded by the same 120s budget as before (modelLoadWaitBudget, test-pinned). - Dictation audio buffers reserve for the 5-minute session cap plus headroom (360s, ~69MB at 48kHz) instead of 1800s (~345MB retained for the process lifetime); pendingSamples now reserves up front so the tap thread avoids growth reallocations. Session timeout derives from the same shared constant. - Analytics route context is served from a cached device selection refreshed on start/prewarm/route-change instead of a blocking CoreAudio enumeration on the main actor ~6x per stop; route-change analytics loads the new selection detached. Recording paths still do live lookups. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01NvcdYAy2h7DXPfnMbwjcH1

@published

…etings Background meeting transcription round-tripped the main actor per diarized segment and re-ran initialize(model:) each time, with refreshModelDownloadState() reassigning the @published state even when unchanged — every reassignment fired objectWillChange into the menubar, warmup-status, and settings subscribers, thousands of times across a long meeting. Skip initialize when the model is already loaded (the pipeline initializes once up front via ensureModelsReadyForPipeline) and suppress redundant state publishes with a manual case+payload comparison that fails safe by re-publishing. TranscriptionPipeline also held both whole-meeting 16kHz channel buffers (~460MB each for 2h) for the entire diarize/transcribe/merge run; each is now released right after its verified last use, so the channels are never both alive past the system phase. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01NvcdYAy2h7DXPfnMbwjcH1

@published

Each in-place mutation of a @published array is its own set, so removeFirst + append emitted two objectWillChange fires per gated buffer — double the intended publish rate, and a mismatch with AudioLevelPublishGateTests' one-emission contract. Build the shifted history locally and assign once. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01NvcdYAy2h7DXPfnMbwjcH1

The bufferListNoCopy destination (converting directly into the returned Array's storage) aborted inside AVAudioConverter with signal 6 on CI's macOS runner, taking down the whole TranscriptedCore test process at AudioResamplerTests.testLoadAndResampleDownmixesStereoAndResamplesToTargetRate. Restore the proven allocate-then-copy path; the transient extra buffer lives only for the duration of the call. The new round-trip tests stay — they pin behavior, not the internal buffer strategy. The no-copy optimization can be revisited with hardware to debug against. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01NvcdYAy2h7DXPfnMbwjcH1

r3dbars · 2026-07-02T22:01:05Z

Maestro hold review: HOLD at head 12bd72b. This is still draft and changes multiple hot paths: dictation start/stop timing, meeting pipeline memory, live sidecar I/O, analytics buffering, audio-level publishing, and day-file append semantics. CI is green, but hardware-smokes are skipped and the PR body itself calls out behavior changes that need local proof. Smallest clear path: run the mapped Mac checks plus a focused manual/perf pass for cold/warm dictation, stop-to-paste, long meeting capture, live drawer, route change, and analytics shutdown persistence; then undraft only after those pass.

claude added 15 commits July 2, 2026 02:49

Register AudioLevelPublishGateTests in the Core coverage list

f393ad2

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01NvcdYAy2h7DXPfnMbwjcH1

r3dbars marked this pull request as ready for review July 3, 2026 01:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance: fix hot-path latency, UI churn, and memory across dictation and meetings#1350

Performance: fix hot-path latency, UI churn, and memory across dictation and meetings#1350
r3dbars wants to merge 15 commits into
mainfrom
claude/app-performance-review-y6htyd

r3dbars commented Jul 2, 2026 •

edited

Loading

Uh oh!

r3dbars commented Jul 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

r3dbars commented Jul 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why

Product Impact

What changed

How I checked it

Risk Review

Notes

Agent handoff

Uh oh!

r3dbars commented Jul 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

r3dbars commented Jul 2, 2026 •

edited

Loading