Skip to content

Capture format: spec doc + format_version and transcript_style markers#1361

Merged
r3dbars merged 1 commit into
mainfrom
claude/agent-fix-6-capture-format-spec-pbwyng
Jul 3, 2026
Merged

Capture format: spec doc + format_version and transcript_style markers#1361
r3dbars merged 1 commit into
mainfrom
claude/agent-fix-6-capture-format-spec-pbwyng

Conversation

@r3dbars

@r3dbars r3dbars commented Jul 2, 2026

Copy link
Copy Markdown
Owner

Why

The product promise is "agent-readable Markdown," but the format had no spec and no version marker — and a saved meeting actually exists in two body grammars (the save-time form and the async restyled form, since MeetingTranscriptStyler fails closed and can skip files). Agents had no way to detect which form they were reading, and contributors had no contract to keep the app writers and the TranscriptedCaptureKit mirror parser in lockstep. Found in the agent-surface audit.

Product Impact

  • Affects: agent artifacts / meetings / dictation
  • Lane: agent workflow
  • Why this matters: the file format is the API; an unversioned, unspecified format quietly breaks every external parser each time a writer changes.

What changed

  • New docs/capture-format.md (315 lines): full meeting frontmatter key reference (core, health, nested speakers/gap_events, auto_summary_*/local_summary_* namespaces, Obsidian block), both meeting body grammars with examples, the save → quick-summary injection → restyle+rename lifecycle (explicitly documenting that both forms exist in the wild), the dictation day-file format, the versioning convention, and stability rules (flat key: value frontmatter only — the app's own scanner skips indented lines; unknown keys must be ignored; additive-only within a version).
  • TranscriptFormatter.swift: meeting saves now emit format_version: 1 and transcript_style: raw.
  • MeetingTranscriptStyler.swift: renderFrontmatter filters any existing transcript_style: line and appends transcript_style: styled — updated in place, idempotent on repeat passes; format_version passes through untouched.
  • DictationTranscriptWriter.swift: day headers emit format_version: 1.
  • TranscriptedCaptureKit: parsed meeting/dictation models gain optional formatVersion/transcriptStyle (additive; no public inits, and zero external references to those fields in CLI/MCP/QA, so no call sites break — those packages untouched).
  • Tests: new Tests/TranscriptedCoreTests/TranscriptFormatVersionTests.swift (round-trips the new keys through the real TranscriptFrontmatter parser); styler tests for marker rewrite + no invented format_version on legacy files + single-marker idempotency; dictation header assertions incl. single-header-on-append; quick-summary regression that injection preserves the new keys; CaptureKit fixtures for versioned and pre-versioning files.
  • CLAUDE.md pointers in Sources/TranscriptedCore/ and Sources/Dictation/ to the new spec.

Version convention: format_version: 1, absent = pre-versioning, parse as 1. The grammar didn't change, so claiming a new version would imply a break that never happened; body-form variation is carried by transcript_style (raw/styled), with heading-sniffing documented for unmarked legacy files.

How I checked it

  • scripts/dev/agent-preflight.sh
  • Selected checks from .agents/test-matrix.yml — Core+CaptureKit union
  • bash build-deps.sh + bash build.sh --no-open + bash run-tests.sh + bash run-integration-smoke.sh + swift test + bash run-e2e-smoke.sh + all four Tools package tests — all run by Swift CI on macOS at this exact head (00eab44), green: https://github.com/r3dbars/transcripted/actions/runs/28621531993 (the workflow executes exactly this matrix as the required merge gate)
  • Manual check: new keys verified flat + key-matched (not allowlisted) in every consumer checked (RecentCaptureScanners, QA validators, MeetingQuickSummaryWriter, LocalMeetingSummarizer); styler filter mirrors the existing title: filter.

Authoring context: written in a Linux session with no local Swift toolchain; verification is the green macOS Swift CI run above.

Risk Review

  • Privacy / local-first behavior reviewed — metadata keys only, local files only
  • Storage path or migration impact reviewed — no paths change; old files parse exactly as before (absent keys = version 1)
  • Public-facing copy stays concrete and matches current product scope
  • Release/update impact reviewed — none
  • Agent PRs link the issue/workpad and stay draft until human review
  • UI changes include sanitized .agent-review/visuals/ evidence — n/a
  • No private transcripts, audio, tokens, personal paths, or customer data are included

Notes

Part of the agent-surface audit series (#1356#1360) and step 1 of the consolidation plan in #1358. Restyled legacy files gain transcript_style: styled without a format_version — intentional, the keys are independent and the spec says so.

Agent handoff

COORD_DONE: GREEN | (this PR) | format spec + version/style markers in all writers | none | none | Swift CI green at head 00eab44 (full matrix) | human review + mark ready + merge

🤖 Generated with Claude Code

https://claude.ai/code/session_01GuGZaFRmNrqfGPpqf7WH4n

The saved meeting/dictation Markdown had no spec and no version marker, and
a meeting file silently changes body grammar when the async restyle rewrites
and renames it — with no way for parsers to tell which form they are holding.

- docs/capture-format.md: authoritative spec for the capture Markdown —
  meeting frontmatter key reference, both meeting body grammars (raw save
  form and restyled form), the save -> quick-summary injection -> restyle
  lifecycle, the dictation day-file format, and stability rules (flat keys
  only, ignore unknown keys, additive-only within a version).
- TranscriptFormatter now stamps flat `format_version: 1` and
  `transcript_style: raw` on every meeting save. Absent keys mean a
  pre-versioning file that parses identically to version 1.
- MeetingTranscriptStyler preserves `format_version` and rewrites the style
  marker to `transcript_style: styled` (filter-then-append, never
  duplicated) when it persists a restyle.
- DictationTranscriptWriter day headers carry `format_version: 1`.
- MeetingQuickSummaryWriter already preserves non-managed frontmatter keys;
  covered with a regression assertion.
- TranscriptedCaptureKit exposes the new keys as optional
  formatVersion/transcriptStyle fields on the parsed models (additive; CLI
  and MCP call sites untouched).
- Tests: new TranscriptFormatVersionTests (SPM), styler marker-rewrite and
  legacy-passthrough fast tests, dictation header assertions, CaptureKit
  parser coverage for versioned and pre-versioning fixtures.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01GuGZaFRmNrqfGPpqf7WH4n
@r3dbars

r3dbars commented Jul 2, 2026

Copy link
Copy Markdown
Owner Author

Merge-room hold: draft plus full format/CaptureKit matrix proof is missing. Live state checked 2026-07-02: head 00eab44f83777bff2f156d0edcede7212b350983, base main, mergeStateStatus=CLEAN, repo-hygiene/build-and-test green, hardware-smokes skipped. This changes core saved Markdown format markers and CaptureKit parser fields; PR body says Mac build/tests, integration smoke, and package-test union were not run. Smallest next action: run the listed Mac matrix and update proof before marking ready.

@r3dbars r3dbars marked this pull request as ready for review July 3, 2026 01:34
@r3dbars r3dbars merged commit 10c8dde into main Jul 3, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants