Skip to content

Performance: fix hot-path latency, UI churn, and memory across dictation and meetings#1350

Open
r3dbars wants to merge 15 commits into
mainfrom
claude/app-performance-review-y6htyd
Open

Performance: fix hot-path latency, UI churn, and memory across dictation and meetings#1350
r3dbars wants to merge 15 commits into
mainfrom
claude/app-performance-review-y6htyd

Conversation

@r3dbars

@r3dbars r3dbars commented Jul 2, 2026

Copy link
Copy Markdown
Owner

Why

A full performance review of the app (dictation hot path, meeting pipeline, UI/startup, observability) surfaced concrete latency, churn, and memory problems: dictation blocked on model load before the mic opened, stop-to-paste work sat behind 200ms polling, meeting transcription round-tripped the main actor per segment with redundant @Published fires, audio-level publishing invalidated the 5k-line settings view at buffer rate, the live sidecar did O(n²) file I/O, analytics did synchronous disk JSON on the main thread per event, and long meetings held ~1GB+ of samples in RAM. This PR fixes that set, one commit per fix (plus a few WIP snapshots from incremental pushes).

Product Impact

  • Affects: dictation / meetings
  • Lane: dictation reliability / meeting reliability
  • Why this matters: faster hotkey-to-recording and stop-to-paste, less CPU/memory pressure during and after long meetings, less UI jank while recording, and lower system-wide typing latency from the event tap.

What changed

  • Dictation records while the voice model loads: when model files are already local, the mic opens immediately and the CoreML load runs concurrently; the stop path (which already waited for the model) now fails fast on load failure, kicks the deduped init when nothing is loading, and shows post-stop copy. Model waits join the init task instead of polling at 200ms (poll survives only for download-progress display), bounded by the same 120s budget.
  • Dictation buffer sizing: audio buffers reserve for the 5-min session cap + headroom (360s ≈ 69MB at 48kHz) instead of 1800s (≈345MB retained for process lifetime); pendingSamples reserves up front to avoid tap-thread growth reallocations.
  • Device-scan caching: analytics route context is served from a cached selection refreshed on start/prewarm/route-change instead of a blocking CoreAudio enumeration on the main actor ~6× per dictation stop. Recording paths still do live lookups.
  • Meeting segment path: per-segment initialize() skipped when the model is already loaded (pipeline initializes once up front); refreshModelDownloadState() no longer reassigns the @Published state when unchanged, killing thousands of per-meeting objectWillChange cascades into menubar/warmup/settings subscribers.
  • Pipeline memory: whole-meeting channel buffers (~460MB each for 2h) are released right after their verified last use instead of living for the whole diarize→transcribe→merge run.
  • Audio-level publishing: levels publish through a 150ms monotonic time-gate (~6.7Hz) instead of per mic/system buffer (~25/s combined), with the history arrays updated via a single assignment so each gated buffer emits exactly one objectWillChange; MeetingSessionController republishes recordingDuration on whole-second boundaries; the meeting overlay skips full view pushes while the displayed mm:ss is unchanged; the live drawer throttles transcript rebuilds to ~5Hz with latest: true.
  • Live transcriber: buffer deep-copy deferred off the tap thread onto the channel's input queue; drawer feed ingest goes through one long-lived main-actor consumer instead of a task per partial hypothesis.
  • Live sidecar: workspace text mirrored in memory (was: full re-read of the growing transcript + preview per accepted entry — O(n²) over meeting length); preview rewrites amortized to 2s; on-disk formats byte-identical.
  • Analytics: enqueue is async with an in-memory buffer and debounced persistence (was: 3+ synchronous JSON file round-trips on the main thread per event); synchronous final persist on app termination; opt-out cancels pending writes; byte-cap no longer re-encodes the whole buffer per removal.
  • Event tap: hotkey bindings cached and rebuilt on .hotkeysDidChange instead of ~8 UserDefaults reads per system-wide keystroke on the main run loop.
  • Stats: refreshStats() runs its ~8 SQLite queries off the main actor and publishes back; streak scan uses a Set + cached formatter; 60s diagnostics heartbeat got 10s timer tolerance.
  • Speaker rename/merge: bounded frontmatter scan (64KB chunks, 1MB cap) pre-filters the library instead of reading every transcript fully; ambiguity falls back to the full read.
  • Custom dictionary: compiled regexes cached keyed on the entries value (was: re-sort + recompile per entry per dictation/meeting segment).
  • Dictation day files: O(1) FileHandle append instead of read-whole-file + atomic rewrite per dictation.

How I checked it

  • scripts/dev/agent-preflight.sh
  • python3 scripts/dev/check-build-source-lists.py (passed)
  • CI build-and-test (macOS) green on head 12bd72b. Two earlier failures were diagnosed from CI logs and fixed: double @Published emission per in-place history mutation (ed25fc6), and a signal-6 abort in AVAudioConverter from a bufferListNoCopy destination buffer, reverted to the proven allocate-then-copy path (12bd72b).
  • Hardware smokes + manual pass run by the maintainer before undrafting (per Maestro hold exit criteria)

New/updated tests: level publish gate, resampler round-trip, dictionary cache invalidation, analytics in-memory buffering, event-tap snapshot contract, speaker-rename frontmatter regression, constants pins.

Risk Review

  • Privacy / local-first behavior reviewed — no payload shape changes; analytics change is buffering-only
  • Storage path or migration impact reviewed — on-disk formats preserved byte-for-byte
  • Public-facing copy stays concrete — only new overlay copy is the post-stop "recording is captured…" loading text
  • Release/update impact reviewed — no version/appcast changes
  • Agent PR stayed draft until CI was green and the Maestro hold's exit criteria were met
  • No private transcripts, audio, tokens, personal paths, or customer data are included

Notes

Known behavior changes reviewers should weigh:

  • Day-file append is no longer atomic — a crash mid-append can truncate the trailing section (in-process writes still serialized).
  • With model files on disk, the dictation overlay goes straight to listening; model-loading UI appears only for true first-run downloads, or briefly after stop if the user beats the load.
  • Force-quit within the analytics 1s debounce window can drop unsent events (normal quit persists synchronously).
  • Meeting waveform meters update at ~6.7Hz instead of 12–24Hz; recordingDuration mirror can be up to ~0.9s stale for diagnostics readers (bucket-scale consumers only).
  • Speaker merge no longer rewrites db_id strings that appear only in body text (pinned by test).

Deliberately deferred (needs hardware/runtime validation or upstream work): FluidAudio 0.15.x / streaming ASR upgrade (docs/voices-model-upgrade-plan.md), promoting chunked decode into the live dictation path, Int16 capture WAV + direct-to-M4A playback encode, moving the event tap off the main run loop, nonisolated ASR inference entry (inference already suspends off-main via FluidAudio), and the resampler no-copy destination buffer (aborted inside AVAudioConverter on CI — needs a Mac to debug against; the allocate-then-copy path is back in place).

Agent handoff

COORD_DONE: GREEN | https://github.com/r3dbars/transcripted/pull/1350 | 13 performance fixes across dictation/meetings/UI/observability + 2 CI-diagnosed fixes | none | decide FluidAudio 0.15.x upgrade scheduling | CI build-and-test green; hardware pass by maintainer | merge when review approves

🤖 Generated with Claude Code

https://claude.ai/code/session_01NvcdYAy2h7DXPfnMbwjcH1

claude added 15 commits July 2, 2026 02:49
appendSection read the whole day file and rewrote it atomically per
dictation, so save cost grew with the day's transcript. Append via
FileHandle instead, reading only the trailing two bytes to pick the
blank-line separator.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01NvcdYAy2h7DXPfnMbwjcH1
The CGEventTap callback ran bindingProvider on every system-wide
keyDown/keyUp/flagsChanged, doing 4 binding lookups (~8+ UserDefaults
reads plus migration fallbacks) per event on the main run loop. Replace
the closure with a snapshot rebuilt on .hotkeysDidChange, which already
routes through reRegisterHotkeys() -> configurePhysicalShortcutDetector().
All in-app binding writes post that notification.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01NvcdYAy2h7DXPfnMbwjcH1
CustomDictionaryTextProcessor re-sorted the entries and compiled one
NSRegularExpression per entry on every dictation and meeting segment.
Cache the sorted+compiled form keyed on the parsed entries value (there
is no change notification for the preference; entries change exactly
when the raw text does), guarded by a lock for concurrent callers.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01NvcdYAy2h7DXPfnMbwjcH1
…entry

Every accepted transcript entry re-read the growing live_transcript.md
and preview.html from disk just to compare before rewriting, so sidecar
I/O grew quadratically with meeting length. Mirror last-written text per
file in memory (one disk read per file per session as fallback), render
the preview from the in-memory transcript, and amortize per-entry preview
rewrites to every 2s. Lifecycle transitions still force a fresh snapshot,
and the preview page polls live_transcript.md directly, which is still
written synchronously per entry.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01NvcdYAy2h7DXPfnMbwjcH1
StatsService.refreshStats() ran ~8 synchronous SQLite queries plus the
streak scan on the main actor at settings-window open. Build the whole
snapshot in a detached utility task (StatsDatabase already serializes on
its own queue), publish back on the main actor, and serialize
overlapping refreshes. Streak scan now uses a Set and a cached
DateFormatter. Also give the 60s runtime-diagnostics heartbeat timer a
10s tolerance so the system can coalesce the wakeup; dirty-shutdown
detection buckets heartbeat age far more coarsely.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01NvcdYAy2h7DXPfnMbwjcH1
Retroactive rename/merge read every transcript in the library fully just
to check for a db_id that only ever lives in YAML frontmatter (written by
TranscriptFormatter and writeFrontmatterSpeakerMetadata). Scan only up to
the closing frontmatter delimiter in 64KB chunks (1MB cap) with byte-level
search; any ambiguity (unopenable file, no leading ---, no close within
cap) falls back to the historical full read so nothing is silently
skipped. Merge no longer rewrites db_id strings that appear only in body
text — pinned by a new test alongside a large-frontmatter regression test.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01NvcdYAy2h7DXPfnMbwjcH1
…transcriber, resampler)

Intermediate snapshot of fixes still being authored; final reviewed
versions land in follow-up commits. Not yet verified on macOS.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01NvcdYAy2h7DXPfnMbwjcH1
Covers the changes snapshotted in the previous commit: audio levels now
publish through a 150ms monotonic time-gate instead of per mic/system
buffer, the meeting overlay skips full view pushes while the displayed
mm:ss is unchanged, the live drawer throttles transcript rebuilds to
~5Hz (latest: true), the live transcriber defers its buffer deep-copy
off the tap thread and feeds the drawer through a single long-lived
main-actor consumer, analytics buffering is in-memory with debounced
persistence, and AudioResampler converts directly into the returned
array's storage (no transient 2x peak).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01NvcdYAy2h7DXPfnMbwjcH1
Intermediate snapshot; final reviewed version lands in follow-up
commits. Not yet verified on macOS.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01NvcdYAy2h7DXPfnMbwjcH1
…ontroller)

Intermediate snapshot; final reviewed version lands in follow-up
commits. Not yet verified on macOS.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01NvcdYAy2h7DXPfnMbwjcH1
…olling

Completes the start-path rework begun in the prior WIP snapshots:

- When the selected model's files are already local (bundled, prefetched,
  or cached), open the microphone immediately and initialize the model
  concurrently. The stop path already waits for the model before
  transcribing; it now fails fast on .failed, kicks the deduped
  initialization when nothing is loading, and shows post-stop copy
  ("recording is captured...") instead of the pre-recording text.
  True needs-download and Whisper starts keep the blocking path.
- Model waits join the engine's initialization task via
  joinModelInitialization() and resume the moment the load settles;
  the 200ms poll survives only for download-progress display. Both wait
  loops are bounded by the same 120s budget as before
  (modelLoadWaitBudget, test-pinned).
- Dictation audio buffers reserve for the 5-minute session cap plus
  headroom (360s, ~69MB at 48kHz) instead of 1800s (~345MB retained for
  the process lifetime); pendingSamples now reserves up front so the tap
  thread avoids growth reallocations. Session timeout derives from the
  same shared constant.
- Analytics route context is served from a cached device selection
  refreshed on start/prewarm/route-change instead of a blocking CoreAudio
  enumeration on the main actor ~6x per stop; route-change analytics
  loads the new selection detached. Recording paths still do live
  lookups.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01NvcdYAy2h7DXPfnMbwjcH1
…etings

Background meeting transcription round-tripped the main actor per
diarized segment and re-ran initialize(model:) each time, with
refreshModelDownloadState() reassigning the @published state even when
unchanged — every reassignment fired objectWillChange into the menubar,
warmup-status, and settings subscribers, thousands of times across a
long meeting. Skip initialize when the model is already loaded (the
pipeline initializes once up front via ensureModelsReadyForPipeline) and
suppress redundant state publishes with a manual case+payload
comparison that fails safe by re-publishing.

TranscriptionPipeline also held both whole-meeting 16kHz channel
buffers (~460MB each for 2h) for the entire diarize/transcribe/merge
run; each is now released right after its verified last use, so the
channels are never both alive past the system phase.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01NvcdYAy2h7DXPfnMbwjcH1
Each in-place mutation of a @published array is its own set, so
removeFirst + append emitted two objectWillChange fires per gated
buffer — double the intended publish rate, and a mismatch with
AudioLevelPublishGateTests' one-emission contract. Build the shifted
history locally and assign once.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01NvcdYAy2h7DXPfnMbwjcH1
The bufferListNoCopy destination (converting directly into the returned
Array's storage) aborted inside AVAudioConverter with signal 6 on CI's
macOS runner, taking down the whole TranscriptedCore test process at
AudioResamplerTests.testLoadAndResampleDownmixesStereoAndResamplesToTargetRate.
Restore the proven allocate-then-copy path; the transient extra buffer
lives only for the duration of the call. The new round-trip tests stay —
they pin behavior, not the internal buffer strategy. The no-copy
optimization can be revisited with hardware to debug against.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01NvcdYAy2h7DXPfnMbwjcH1
@r3dbars

r3dbars commented Jul 2, 2026

Copy link
Copy Markdown
Owner Author

Maestro hold review: HOLD at head 12bd72b. This is still draft and changes multiple hot paths: dictation start/stop timing, meeting pipeline memory, live sidecar I/O, analytics buffering, audio-level publishing, and day-file append semantics. CI is green, but hardware-smokes are skipped and the PR body itself calls out behavior changes that need local proof. Smallest clear path: run the mapped Mac checks plus a focused manual/perf pass for cold/warm dictation, stop-to-paste, long meeting capture, live drawer, route change, and analytics shutdown persistence; then undraft only after those pass.

@r3dbars r3dbars marked this pull request as ready for review July 3, 2026 01:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants