Skip to content

App-improvement follow-through: action-item done/due metadata, semantic search, silent-failure instrumentation, disk guards#1352

Open
r3dbars wants to merge 5 commits into
mainfrom
claude/app-improvement-ideas-qvelv7
Open

App-improvement follow-through: action-item done/due metadata, semantic search, silent-failure instrumentation, disk guards#1352
r3dbars wants to merge 5 commits into
mainfrom
claude/app-improvement-ideas-qvelv7

Conversation

@r3dbars

@r3dbars r3dbars commented Jul 2, 2026

Copy link
Copy Markdown
Owner

Why

Follow-through on the "what else would you improve" review: the highest-leverage gaps that were still real after re-verifying the tree (most of FIX_ROADMAP already landed). The biggest two: list_action_items has advertised a status filter since the cross-meeting tools shipped but nothing ever wrote a status (the filter could never match), and search was lexical-only so paraphrase queries against the meeting library came up empty.

Product Impact

  • Affects: meetings / agent artifacts / dictation
  • Lane: agent workflow / meeting reliability
  • Why this matters: closes the remaining loops on the "ask my history" moat — open/done finally means something on action items, semantic_search finds paraphrases FTS5 can't ("pricing pushback" finds "they balked at the cost") with a fully on-device embedding, and the agent orientation tools (recap/recent_context) now answer "what was this meeting" with the summary instead of greetings. Also turns three classes of silent failure (skipped audio maintenance, dictation buffer truncation, doomed-from-the-start low-disk recordings) into visible, upfront signals.

What changed

  • Action items carry real done/due metadata via trailing bullet markers — (due: Friday), (status: done), bare (done):
    • MeetingQuickSummaryExtractor appends (due: ...) when a commitment sentence has a recognizable deadline cue. Extraction never marks anything done — closing an item is a write-back: edit the marker in the saved Markdown, the MCP file watcher reindexes.
    • CaptureSummaryParser parses markers into ActionItem.status/.due (bare-token set is curated so real parentheticals like "call Bob (the vendor)" are never shredded).
    • MCP TranscriptIndex adds status/due columns, list_action_items filters on real values instead of AND 1=0, and digest open-counts become accurate. Unmarked items — including all pre-existing summaries — stay open.
  • Local semantic search (semantic_search MCP tool): Apple's on-device NLEmbedding sentence model (ships with macOS — no bundled weights, nothing leaves the machine) embeds ~400-char chunks of consecutive meeting utterances plus dictation entries at index time into a new semantic_chunks table; queries brute-force cosine over normalized Float32 blobs. Model-unavailable systems get an explicit message and lexical search is unaffected. Tests use a deterministic bag-of-words embedding so CI never depends on the model asset. Index schema bumped to v4 (existing indexes rebuild from disk via the existing version gate).
  • recap / recent_context previews use the real summary (NEXT_WORK refactor(ui): Split FloatingPanel.swift into 10 organized files #2): recap prefers summary prose + compact Decisions/Action-items/Open-questions lines parsed from the Markdown; recent_context builds a one-line preview from the meeting's indexed summary facts. Both fall back to the old dialogue/first-utterance behavior only when a meeting has no summary.
  • Silent failures now report: retained-audio maintenance (catch { continue } in prune / WAV→M4A / playback-mix / stale-temp / failed-audio paths) forwards each skip through a hook to EventReporter as audio_maintenance_failure (operation + NSError domain/code only, no paths). ParakeetEngine's 30-min buffer cap emits a once-per-recording audio_buffer_truncated warning with the dropped duration.
  • Disk safety: RecordingValidator.minimumDiskSpace raised 100MB → 1GB (dual-WAV capture burns ~1.4GB/hour and the runtime watchdog only stops at 50MB, so 100MB start floors let doomed recordings begin). New CaptureLibrarySize measures the capture library after launch maintenance and emits a bucketed capture_library_size event with the retention setting — first visibility into unbounded growth under the default never retention.
  • docs/NEXT_WORK.md refreshed against the tree — most of the original list had shipped; the doc now says what's actually left (speaker-identity trust work, plus the items blocked on a real Mac).

How I checked it

  • scripts/dev/check-build-source-lists.py (source-list contract passes with the new files)
  • CI green on this branch as of the semantic-search commit (build + fast tests + swift test + integration/e2e smokes + all four Tools packages); latest commit re-running
  • bash build.sh --no-open locally — not run: authored in a Linux container, no macOS toolchain; relying on CI
  • New coverage: CaptureLibrarySizeTests, extractor due-marker suite, maintenance-failure reporting suite (fast tests); CaptureSummaryParserTests marker grammar; SummaryRollupTests status filter + digest counts + recap/recent_context preview behavior; SemanticSearchTests chunking/ranking/filters/reindex/fallback
  • Manual check: mark a bullet (done) in a saved meeting, confirm list_action_items status:"open" stops returning it after reindex; run semantic_search with a paraphrase query against a real library

Risk Review

  • Privacy / local-first behavior reviewed — new events carry operation names, error domain/code, and coarse size buckets only; embeddings are computed and stored locally, never sent anywhere
  • Storage path or migration impact reviewed — MCP index schema bump triggers the existing rebuild-from-disk gate (first rebuild pays a one-time embedding cost proportional to library size); no on-disk transcript format change (markers are plain bullet text)
  • Public-facing copy stays concrete (disk-space error message now states the real requirement)
  • Release/update impact reviewed — none
  • Agent PR stays draft until human review
  • UI changes — none
  • No private transcripts, audio, tokens, personal paths, or customer data are included

Notes

Still deferred, with reasons (also captured in the refreshed docs/NEXT_WORK.md):

  • Onboarding meeting dry-run — the auto-detect copy fix already landed in the calendar step; what remains is a real capture flow inside first-run onboarding, which shouldn't be built blind.
  • Hardware-smoke CI lane — needs a self-hosted macOS runner (infrastructure, not code).
  • Settings/Home god-object split — large refactor, wrong to attempt without a compiler.
  • Transcribe-instead-of-discard pre-sleep dictation audio — touches real-time recovery semantics; needs hardware testing.
  • Speaker-identity work (negative exemplars, multi-exemplar voiceprints) — rides the eval harness; should not be changed blind.

Agent handoff

COORD_DONE: BRIEF | this PR | action-item done/due metadata + semantic_search + summary previews + silent-failure instrumentation + disk guards | none | verify on macOS CI + manual semantic_search sanity run | check-build-source-lists.py only (no local toolchain) | run full test matrix on macOS

🤖 Generated with Claude Code

https://claude.ai/code/session_012muZDEbHFBfzsuhR2XHmJC

claude added 4 commits July 2, 2026 12:03
list_action_items has advertised a status filter since the rollup tools
shipped, but nothing ever populated status: the extractor wrote bare
'Owner: text' bullets, the parser never looked for metadata, and the
index hardcoded 'AND 1=0' for status:done. The filter could never match.

Give action bullets a small trailing-marker grammar that closes the loop:

- '(due: Friday)' — free-text due hint
- '(status: done)' or bare '(done)' — closed status (curated token set,
  so real parentheticals like 'call Bob (the vendor)' are never shredded)

Wired through all three layers:

- MeetingQuickSummaryExtractor appends '(due: ...)' when a commitment
  sentence carries a recognizable deadline cue. Extraction never marks
  anything done — closing an item is a write-back: edit the marker in
  the saved Markdown and the MCP file watcher reindexes it.
- CaptureSummaryParser parses markers into ActionItem.status/.due
  (defaulted init params keep existing call sites source-compatible).
- TranscriptIndex schema v3 adds status/due columns (version gate
  rebuilds existing indexes from disk), inserts populate them, and
  list_action_items/digest filter and count on real values.

Items with no marker — including everything saved before this change —
still count as open.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_012muZDEbHFBfzsuhR2XHmJC
…cation failures

Two classes of silent failure now produce events:

- Retained-audio maintenance (retention prune, WAV->M4A compression,
  playback mix, stale-temp cleanup, failed-audio compression) kept its
  deliberate skip-and-continue behavior but every skipped file vanished
  into 'catch { continue }' — orphaned WAVs and unpruned audio built up
  with no signal anywhere. Each swallowed error now reports through a
  hook on MeetingAudioStorageManager (operation + NSError domain/code
  only, no paths), which TranscriptedAppState wires to EventReporter as
  a warning-level 'audio_maintenance_failure' event. A hook rather than
  a direct EventReporter call keeps the manager compilable in the fast-
  test runner, which doesn't include EventReporter's sources.

- ParakeetEngine's 30-minute streaming buffer cap dropped the oldest
  audio via removeFirst with no trace. The dictation session cap should
  end a recording long before that fires, so any hit means real audio
  silently left the transcription buffer; it now emits a once-per-
  recording 'audio_buffer_truncated' warning with the dropped duration.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_012muZDEbHFBfzsuhR2XHmJC
…ry growth visibility

Two disk-safety gaps around retained audio:

- The pre-recording disk check passed with 100MB free, but meeting
  capture writes dual WAV at ~1.4GB/hour and the in-recording watchdog
  only hard-stops at 50MB — so a recording could start with minutes of
  runway and die mid-meeting. Raise RecordingValidator.minimumDiskSpace
  to 1GB so a doomed recording fails upfront with a clear message
  instead of losing audio later.

- Audio retention defaults to 'never' and nothing ever reported how big
  the capture library had grown; unbounded growth was invisible until
  the disk filled. After the launch audio-maintenance pass, measure the
  library (symlink-safe, skips linked folders) and emit a
  'capture_library_size' event with a coarse size bucket plus the
  retention setting — the signal a future retention nudge needs.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_012muZDEbHFBfzsuhR2XHmJC
The one major NEXT_WORK bet still open: paraphrase queries against the
meeting library ('pricing pushback' should find 'they balked at the
cost'), previously impossible because search was lexical FTS5 only. The
blocker was bundling an embedding model — Apple's NLEmbedding sentence
model ships with macOS, so nothing needs downloading and nothing leaves
the machine.

- SemanticEmbedding.swift: embedding seam (protocol + NLEmbedding-backed
  provider), Float32 blob codec over normalized vectors, and a chunker
  that groups consecutive utterances into ~400-char windows (one vector
  per utterance is wasteful, one per meeting is useless).
- TranscriptIndex schema v4 -> semantic_chunks table (version gate
  rebuilds old indexes); meeting chunks and dictation entries embed at
  index time inside the existing transactions; brute-force cosine scan
  at query time — fine at personal-library scale.
- New read-only semantic_search MCP tool (query/kind/count/date window)
  returning scored hits with hydrated meeting titles. When the model
  asset is missing the tool says so explicitly and indexing skips
  semantic rows; lexical search is unaffected.
- Tests use a deterministic FNV-1a bag-of-words embedding so CI never
  depends on the model asset: chunking, ranking, filters, reindex
  cleanup, unavailable-model fallback, codec round-trip.
- e2e smoke source list gains SemanticEmbedding.swift (it compiles
  TranscriptIndex.swift with raw swiftc).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_012muZDEbHFBfzsuhR2XHmJC
@r3dbars r3dbars changed the title App-improvement follow-through: action-item done/due metadata, silent-failure instrumentation, disk guards App-improvement follow-through: action-item done/due metadata, semantic search, silent-failure instrumentation, disk guards Jul 2, 2026
NEXT_WORK item 2, the last cheap piece of the 'ask my history' loop:
the agent's orientation tools answered 'what was this meeting' with the
least informative content in the file — recap previewed the first 15
dialogue lines and recent_context previewed the literal first utterance,
i.e. greetings and audio checks — while the structured summary sat in
the same Markdown and in the index.

- recap: prefer a structured preview (summary prose bullets from
  local_summary/auto_summary frontmatter plus compact Decisions /
  Action items / Open questions lines, parsed via CaptureSummaryParser);
  fall back to dialogue lines only when a meeting has no summary.
- recent_context: prefer a one-line preview built from the meeting's
  indexed summary facts (decisions first, then owner-prefixed action
  items); fall back to the first utterance as before. Index-only, no
  new file reads.
- Also refresh docs/NEXT_WORK.md against the tree: most of the original
  list has shipped; what remains is speaker-identity trust work and the
  items blocked on a real Mac.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_012muZDEbHFBfzsuhR2XHmJC
@r3dbars

r3dbars commented Jul 2, 2026

Copy link
Copy Markdown
Owner Author

Maestro hold review: HOLD at head f9fc688. Still draft and broad: action-item writeback semantics, MCP schema rebuild to v4, local semantic embeddings/chunks, summary previews, audio-maintenance failure events, buffer truncation warnings, disk-space start gate, and capture-library telemetry. This needs Mac/tool proof and real-library behavior proof, not just green CI. Smallest clear path: run MCP/Tools tests, semantic_search on a real local library, action-item (done)/(due) reindex check, index rebuild/perf sanity on a nontrivial library, and confirm telemetry carries only operation/error buckets.

@r3dbars r3dbars marked this pull request as ready for review July 3, 2026 01:34

r3dbars commented Jul 3, 2026

Copy link
Copy Markdown
Owner Author

Fair hold — CI green isn't real-library proof, and this branch was authored from a Linux box, so the Mac-side proofs need someone at a Mac. Making that as cheap as possible:

1. MCP/Tools tests (same suites CI ran, locally):

swift test --package-path Tools/TranscriptedCaptureKit
swift test --package-path Tools/TranscriptedMCP

2 + 4. Real-library semantic_search + rebuild/perf sanity, without touching the production index — point only the index at scratch, keep the real capture library as data:

cd Tools/TranscriptedMCP && swift build -c release
TRANSCRIPTED_INDEX_DIR=$(mktemp -d) time ./.build/release/transcripted-mcp --self-test

--self-test runs the full reconcile, so time here IS the v4 rebuild+embed cost on your real library (expect it to be the slowest first run; incremental after that). Then wire the same binary + env into Claude Desktop/Code as a scratch server and run semantic_search with a paraphrase query you know only appears reworded ("pricing pushback" style), plus recap to see summary previews instead of greetings.

3. (done)/(due) reindex check: edit any saved meeting's Action Items bullet to end with (done), bump the file (save), rerun the self-test (or let the file watcher catch it in a live server), then list_action_items with status:"open" → the item should be gone; status:"done" → present with status populated.

5. Telemetry payloads — provable from source without a Mac: both new events are fixed-key literal dictionaries, no interpolated paths/titles:

  • audio_maintenance_failure: operation / error_domain / error_code (TranscriptedAppState.swift, handler wiring)
  • capture_library_size: size_bucket / audio_retention (bucketed in CaptureLibrarySize.bucketLabel)
  • audio_buffer_truncated: dropped_seconds / capacity_seconds (ParakeetEngine.swift)
    LocalObservabilityPayloadSanitizer still applies on top before disk writes.

If the breadth is the concern, happy to split this into 3 PRs (metadata+previews / semantic search / instrumentation+disk) — say the word and I'll cut them.


Generated by Claude Code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants