Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
67 commits
Select commit Hold shift + click to select a range
3da737c
docs(plans): provenance-epic revisions to plans 3-8
gordonwoodhull May 20, 2026
3edf500
docs(plans): tighten Plan 3 scope + propagate to plans 6/7/7a
gordonwoodhull May 20, 2026
6afc210
docs(plan-3): incorporate review decisions on hashing, drive modes, f…
gordonwoodhull May 21, 2026
4d8683d
docs(plan-3): reuse existing test helpers; pin decisions; long-lived …
gordonwoodhull May 21, 2026
51e2f40
plan-3 phase 1: meta-hash + divergence localization
gordonwoodhull May 21, 2026
a822169
plan-3 phase 2: idempotence test crate scaffolding
gordonwoodhull May 21, 2026
a44d562
plan-3 phase 3: 11 carry-forward fixtures
gordonwoodhull May 21, 2026
bc33a28
chore: refresh lockfiles after npm install + wasm build
gordonwoodhull May 21, 2026
47e7fd7
plan-3 phase 4a: 9 gap-closure doc fixtures + 1 in queue
gordonwoodhull May 21, 2026
f8e9c51
plan-3 phase 4b: include-in-header + resource-image fixtures
gordonwoodhull May 21, 2026
95b8644
plan-3 phase 4c: website-chrome, website-links, website-listing
gordonwoodhull May 21, 2026
70f2ff7
plan-3 phase 4d: attribution fixture
gordonwoodhull May 21, 2026
ef76747
plan-3 phase 6: idempotence-contract.md + cross-links
gordonwoodhull May 21, 2026
98c585f
plan-3 phase 7: final verification + queue state recorded
gordonwoodhull May 21, 2026
6dfb3e5
plan-3: check off the per-fixture coverage-gaps inventory
gordonwoodhull May 21, 2026
1abbafc
bd-rz2we: split vfs_root into write-root + url-root in ResourceResolv…
gordonwoodhull May 21, 2026
2d050d9
docs(plans): Plan 4 implementation-ready + cross-plan `from` rename
gordonwoodhull May 21, 2026
1fac357
docs(plans-4-and-5): annotate bd-3odjm as Plan-5-owned baseline failure
gordonwoodhull May 21, 2026
7dedebc
docs(plans): propagate SmallVec macro to plans 5 & 6 code samples
gordonwoodhull May 21, 2026
431be7e
docs(plans): bump SmallVec capacity to 2 and fold in research
gordonwoodhull May 21, 2026
afe67b0
docs(plans): correct SmallVec cap=2 memory delta (~40 bytes, not 16)
gordonwoodhull May 22, 2026
5542912
docs(plan-5): review pass — checklist, TS shape, scope cleanups
gordonwoodhull May 22, 2026
61d4af2
docs(plan-4): consolidate file-id walkers + close open questions
gordonwoodhull May 22, 2026
5465cb0
Merge feature/provenance-plan-5: review pass on Plan 5 wire format
gordonwoodhull May 22, 2026
9fecf91
plan-4: implement SourceInfo Generated + Anchor types
gordonwoodhull May 22, 2026
d51db4e
docs(plan-4): record implementation surprises
gordonwoodhull May 22, 2026
7fa41b5
docs(plan-5): review-2 pass — phase reorder, strict readers, TS rename
gordonwoodhull May 22, 2026
765adb0
docs(plan-5): sharpen Phase 0 + carry Plan-4 implementation learnings
gordonwoodhull May 22, 2026
0d7d1e9
docs(plan-5): resolve open questions from pre-impl audit
gordonwoodhull May 22, 2026
d044b97
fix(pampa): close bd-3odjm with code-3 dual-shape + code-4 readers (P…
gordonwoodhull May 22, 2026
eef41fa
docs(plan-6): review pass — close open questions, expand scope, fix P…
gordonwoodhull May 22, 2026
4d2ff4b
feat(pampa): emit Generated as JSON wire code 4 (Plan 5 Phases 3+4, a…
gordonwoodhull May 22, 2026
0b4f84a
docs(plans 6-8): post-review followups — cleanup open questions, cros…
gordonwoodhull May 22, 2026
254290e
feat(preview-renderer): consume code-4 Generated wire format (Plan 5 …
gordonwoodhull May 22, 2026
638a8d8
Merge review/provenance-plan-6: Plan-6 review pass
gordonwoodhull May 22, 2026
fe8c22d
plan-6 phase 0: add Inline/Block::source_info_mut accessors
gordonwoodhull May 22, 2026
d4ec690
plan-6 audit: enumerate sites + document AttrSourceInfo invariant
gordonwoodhull May 22, 2026
951222a
plan-6: shortcode stamper + dispatch funnel + error/literal call sites
gordonwoodhull May 22, 2026
163c01c
plan-6: synthesizer transforms emit Generated provenance
gordonwoodhull May 22, 2026
e8df75c
docs(plans 9, 10): research plans for ValueSource threading + Dispatc…
gordonwoodhull May 22, 2026
e8119f8
plan-6 tests: per-transform shape + shortcode + Lua enrichment
gordonwoodhull May 22, 2026
02b401a
docs(plan-7): rewrite — decompose API, settle review findings, add im…
gordonwoodhull May 24, 2026
91e3edd
plan-6: verify pass + WASM Cargo.lock update + plan checklist closed
gordonwoodhull May 22, 2026
8aeda2a
Merge review/provenance-plan-7: Plan-7 review pass + Plans 9/10 resea…
gordonwoodhull May 24, 2026
3262d6c
docs(plans 7, 7a, 10): post-merge follow-ups from Plan-7 wrap-up review
gordonwoodhull May 24, 2026
a22ce41
plan-7 phase 1: foundation primitives (preimage_in, atomicity registr…
gordonwoodhull May 25, 2026
9a473fe
plan-7 phase 2+3a: writer internals — soft-drop, Transparent/Omit, mu…
gordonwoodhull May 25, 2026
66181cf
docs(plan-7): fold codebase facts into Phase 2 + Phase 4 sections
gordonwoodhull May 25, 2026
4f815f0
ci(e2e): drop path filter so hub-client e2e runs on every PR (bd-izh3)
gordonwoodhull May 25, 2026
a0a4c7c
plan-7 phases 4-6: WASM bridge takes baseline AST; lift read-only guard
gordonwoodhull May 25, 2026
fceb862
docs(hub-client/changelog): plan-7 phases 4-6 entry
gordonwoodhull May 25, 2026
20f4b0f
plan-7 phase 7: SPA setAst wired; FNV-1a echo-prevention; DiagnosticS…
gordonwoodhull May 25, 2026
c72640a
plan-7 phase 8 (subset) + 9: WASM wrapper test, smoke, follow-up beads
gordonwoodhull May 25, 2026
4ee51e4
docs(changelog): update plan-7 phases 4-6 commit hash after rebase
gordonwoodhull May 25, 2026
6afa861
docs(plans 7, 9): note Plan 7 shipped; Phase-5 tests now unblocked
gordonwoodhull May 25, 2026
b879896
docs(plan-7b): write Plan 7 test-o-rama consolidation plan
gordonwoodhull May 25, 2026
e0aacdc
docs(provenance): contract doc for adding new Generated kinds
gordonwoodhull May 25, 2026
da44e04
docs(plan-7c): closure gaps from Plan-7 implementation session
gordonwoodhull May 25, 2026
bdcfdc5
fix(pampa/writers/incremental): recurse into non-atomic Generated wra…
gordonwoodhull May 25, 2026
47c4c57
docs(changelog): note q2-preview sectionize-wrapper edit fix (bdcfdc53)
gordonwoodhull May 25, 2026
b9f64b5
fix(pampa/writers/incremental): descend wrappers when deriving target…
gordonwoodhull May 25, 2026
2bf9266
fix(pampa/writers/incremental): preserve YAML frontmatter when blocks…
gordonwoodhull May 25, 2026
8f1d33d
refactor(pampa/writers/incremental): name the transparent-wrapper pat…
gordonwoodhull May 25, 2026
5f2bbab
fix(hub-client/ReactPreview): surface soft-drop warnings immediately …
gordonwoodhull May 25, 2026
3f96b39
docs(changelog): note soft-drop diagnostic surfacing fix (5f2bbab0)
gordonwoodhull May 25, 2026
e584428
refactor(pampa/writers/incremental): make CoarsenedEntry self-contain…
gordonwoodhull May 26, 2026
de2b2f6
fix(quarto-error-reporting): gracefully degrade ariadne source contex…
gordonwoodhull May 26, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 10 additions & 9 deletions .github/workflows/hub-client-e2e.yml
Original file line number Diff line number Diff line change
@@ -1,21 +1,22 @@
name: Hub-Client E2E Tests

on:
push:
branches: [main]
paths:
- 'hub-client/**'
- '.github/workflows/hub-client-e2e.yml'
pull_request:
paths:
- 'hub-client/**'
- '.github/workflows/hub-client-e2e.yml'
workflow_dispatch:
inputs:
recreate-all-snapshots:
description: 'Delete and recreate ALL visual regression baselines'
type: boolean
default: false
push:
branches:
- main
pull_request:
branches:
- main

concurrency:
group: ${{ github.workflow }}-${{ github.ref == 'refs/heads/main' && github.run_id || github.event.pull_request.number || github.ref }}
cancel-in-progress: true

jobs:
e2e-tests:
Expand Down
4 changes: 4 additions & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,7 @@ proc-macro2 = { version = "1.0.106", features = ["span-locations"] }
schemars = "1.2.1"
serde = { version = "1.0.228", features = ["derive"] }
serde_json = "1.0.149"
smallvec = { version = "1.13", features = ["serde"] }
serde_yaml = "0.9"
thiserror = "2.0"
toml = "0.9.11"
Expand Down
245 changes: 245 additions & 0 deletions claude-notes/designs/incremental-writer-internals.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,245 @@
# Incremental writer internals — `CoarsenedEntry` and the self-contained contract

**Status:** Active (contract pinned 2026-05-25 by the
`CoarsenedEntry::Rewrite` refactor).
**Types:** `pampa::pandoc::Block`, `quarto_source_map::SourceInfo`,
`quarto_ast_reconcile::ReconciliationPlan`.
**Reference impl:**
[`crates/pampa/src/writers/incremental.rs`](../../crates/pampa/src/writers/incremental.rs)
(`CoarsenedEntry`, `coarsen`, `coarsen_blocks`, `coarsen_keep_before_block`,
`assemble`, `emit_entries`).
**Plans:**
[Plan 7](../plans/2026-05-04-q2-preview-plan-7-incremental-writer.md)
(writer design) ·
[Plan 7c](../plans/2026-05-25-q2-preview-plan-7c-closure-gaps.md)
(Phase 8 — Transparent recursion in `RecurseIntoContainer`) ·
[CoarsenedEntry self-contained refactor](../plans/2026-05-25-coarsened-entry-self-contained.md).
**Sibling docs:**
[Transparent wrappers](./transparent-wrappers.md) (the *traversal*
primitive — what the writer skips through) ·
[Provenance contract](./provenance-contract.md) §7 (how atomic-kind
decisions flow into the writer's branches).

## Purpose

The incremental writer answers a single question: *given an
`(original_qmd, original_ast, new_ast, plan)` tuple, what qmd text
should we hand back to the user?* Its output round-trips through the
read pipeline to produce an AST that the next reconciliation matches
against — so the writer's bytes are the canonical persistence form
of an edit.

It does this in two phases. **Coarsen** walks the reconciler's
hierarchical alignment plan and reduces it to a flat list of
`CoarsenedEntry` values — one per emitted block sequence. **Assemble**
walks that list, concatenates the bytes, and inserts separators. The
split lets the coarsen step be tested in isolation and lets a future
"minimal Monaco edit" consumer reuse the entry list without
re-running the diff.

This document pins the contract that holds the two phases together.

## The `CoarsenedEntry` contract

> Every variant of `CoarsenedEntry` must carry enough information
> to produce its emit bytes **without further context**. No
> index-into-an-ambient-slice deferral. No "look this up at emit
> time" handoffs. Each entry is self-describing.

The five variants today:

| Variant | Bytes come from | Self-contained because |
|---|---|---|
| `Verbatim` | `original_qmd[byte_range]` | `byte_range` is absolute. |
| `InlineSplice` | `block_text` field | Pre-computed at coarsen time. |
| `Rewrite` | `block_text` field | Pre-computed at coarsen time. |
| `Transparent` | concatenation of children | Children are themselves self-contained. |
| `Omit` | (nothing) | Emits nothing. |

The two indices that *do* appear on entries — `Verbatim::orig_idx`,
`InlineSplice::orig_idx` — are not used for *byte content*. They're
hints to `compute_separator` for its "consecutive-in-original gap"
optimization, and they're always `Option`: `None` for children
inside a `Transparent` wrapper, where any index would be ambiguous
(top-level? child-level?). The bytes themselves never look up against
an ambient slice.

`emit_entries` walks entries in order and concatenates. Its
`new_ast: &Pandoc` parameter is currently unused for byte production
in any variant — that's the post-condition of the contract. (We
leave the parameter in the signature for now; removing it is a
tidying follow-up flagged in the refactor plan.)

## Why this matters

The contract isn't decorative. Three reasons it exists:

### 1. `Transparent` recursion composes only if children are self-contained

A `Transparent` entry represents a synthesized wrapper whose own
bytes are empty (sectionize Div, footnotes container, appendix
container) but whose children carry real source preimage. The
writer "looks through" the wrapper by inlining the children into the
emit stream.

This composition requires that each child knows how to produce its
own bytes *without* depending on its position in some ambient slice.
A child carries `orig_idx: None` to opt out of the original-gap
optimization (its index is child-relative, not top-level). If the
same child also tried to defer its *bytes* to a "look up index N in
new_ast.blocks" handoff, the lookup would silently target the wrong
slice — `new_ast.blocks` is the top-level array, and child indices
don't index into it.

That is exactly the bug that motivated this contract. Before
2026-05-25 the `Rewrite` variant carried `new_idx: usize`, which
worked at the top level (every entry corresponded one-to-one with
a top-level block; indices were unambiguous) but broke the moment
`Rewrite` could be produced inside a `Transparent` recursion. The
panic shape: *"index out of bounds: the len is 1 but the index is N"*
— top-level slice has one entry (the wrapper), child index N is
out of bounds.

### 2. Minimal-edit diffing wants a self-contained intermediate form

Today `incremental_write` returns a single full-document edit. A
future "produce minimal Monaco edits" consumer (Plan 7's deferred
follow-up) wants to walk the coarsened plan and emit *per-entry*
deltas — `Verbatim` entries are no-ops if the original gap matches;
`InlineSplice` and `Rewrite` are localized text replacements at
known source ranges.

That walker needs every entry to expose its *intended text* (the
bytes that would land in the result) directly. If `Rewrite` deferred
to an emit-time lookup, the walker would have to re-thread `new_ast`
into a context it doesn't otherwise need. The self-contained shape
gives the walker exactly what it asks for — one record per emitted
block, fully self-describing.

### 3. Behaviour is the *same*, the *timing* changes

Pre-refactor, `write_block_to_string(&new_ast.blocks[new_idx])` ran
inside `emit_entries`. Post-refactor, the equivalent call runs at
the corresponding producer site in `coarsen_blocks` /
`coarsen_keep_before_block`. `write_block_to_string` is referentially
transparent — it depends only on its `Block` argument, has no global
state, no I/O, no clock reads. Moving the call earlier produces
byte-identical output and runs exactly once either way (Rewrite is
the catch-all path; we always emit it when produced).

That matters because the change is "free of behaviour" — it's a
shape change, not a semantics change. A reader reviewing the diff
shouldn't need to worry that some downstream test will break in a
subtle way.

## Anti-patterns

Don't add a `CoarsenedEntry` variant that:

- **Defers to a named slice.** "Index N into `new_ast.blocks`,"
"child M of original block at index K," etc. The moment a future
refactor calls the producer in a different *context* (recursion,
reuse from a sibling crate, a test fixture), the index points at
the wrong slice and the failure is silent until the panic.
- **Depends on context not encoded in the variant itself.** If you
need "the prev sibling's bytes," "the wrapper's original
position," or similar context to make sense of an entry, pre-fold
the context into the entry's payload or restructure so it doesn't
need the context.
- **Requires specific timing of side effects.** `write_block_to_string`
is pure — calling it at coarsen vs emit time is observably
identical. If your variant only works when its bytes are computed
at one specific moment, that's a sign the entry shape is wrong.

When in doubt, look at `InlineSplice`. It was the first variant to
carry pre-computed `block_text` (introduced when partial inline
rewrites made deferral impossible — the splice text doesn't
reconstruct from any single block) and is the structural blueprint
the rest of the variants should match.

## History

`CoarsenedEntry` started life with two variants in commit
`eb81cbc5` (the original incremental-writer landing): `Verbatim`
carrying a `byte_range`, and `Rewrite` carrying a `new_idx: usize`.
The writer was top-level only — each entry corresponded one-to-one
with a top-level block, indices were unambiguous, and deferring
`write_block_to_string` to emit time saved a call when the entry
was never emitted (defensive, but the entry was always emitted in
practice).

The asymmetry was introduced silently in `ab10f37b`, which added
`InlineSplice { block_text, orig_idx }` to support partial block
rewrites. Splice text mixes original bytes with newly-serialized
inlines and doesn't reconstruct from any single `Block` — so the
text was necessarily pre-computed at coarsen time. No one
refactored `Rewrite` to match; the two patterns coexisted.

`9a473fe9` (Plan 7 phase 2+3a) added `Transparent` and `Omit`.
`Verbatim::orig_idx` and `InlineSplice::orig_idx` became `Option`
so children inside `Transparent` could opt out of the original-gap
optimization. The commit **explicitly flagged** the latent `Rewrite`
issue with a comment: *"result_idx is unused for child Rewrites
(a child Rewrite would need a different lookup mechanism; not
exercised by today's synthesizers)."* Accurate at the time — no
producer of child entries was emitting `Rewrite`.

`bdcfdc53` (Plan 7c phase 8) added a Transparent-recursion path in
`coarsen_blocks` for the *changed-wrapper* case
(`RecurseIntoContainer` with a `block_container_plans` entry). For
the first time, `coarsen_blocks` ran on child slices, and a
`Rewrite` produced there carried a child-relative index. The "not
exercised" caveat from `9a473fe9` no longer held — the panic the
contract addresses became reachable.

The 2026-05-25 refactor that motivated this doc lifted `Rewrite`
to `{ block_text: String }`, matching `InlineSplice`. All four
producer sites now pre-compute. The implementation cost is a moved
`write_block_to_string` call; the gain is the contract this doc
pins.

The same session also closed a latent soft-drop gap that the panic
had been masking. The `BlockAlignment::UseAfter` arm now detects
*atomic-Generated with preimage* (the user edited inside a
shortcode-resolved block, the reconciler split the edit into a
deleted-original + new-block, but the new block still carries the
token's `Invocation` anchor) and emits `Verbatim` of the preimage
plus a `Q-3-43` warning, instead of the previous let-user-win
`Rewrite` (which would have written the resolved bytes — the edit
applied to *generated* content — back into the source qmd, poisoning
the user's source). The pattern: when an entry's *new* block looks
like an attempt to edit content the user can't actually edit, refuse
the edit at the writer regardless of what the reconciler's alignment
said.

## Promotion path

`CoarsenedEntry` is private to
`crates/pampa/src/writers/incremental.rs` today, with two internal
consumers: `assemble`'s `emit_entries` and the
`compute_edits_from_coarsened` helper (which currently calls
`assemble` internally).

Promote the type (and its emission helpers) to a shared module the
moment a second crate wants to consume the coarsened plan. The
expected first non-pampa consumer is the minimal-edit-diffing
walker described above. Until then, the type stays here — premature
generalisation has its own debt, and the contract above is what
matters, not the import path.

## Adding a new variant

If you find yourself wanting a new `CoarsenedEntry` variant:

1. Ask whether one of the existing five already serves. Most "I need
a new shape" instincts collapse into `Transparent` (for
wrappers) or `Rewrite` (for "anything else, re-serialize").
2. If you genuinely need a new variant, design it self-contained
from the start. The variant's payload should be everything
`emit_entries` needs to produce its bytes; nothing more, nothing
deferred.
3. Update this doc's table and the variant's doc comment in the
`CoarsenedEntry` enum to describe the self-containment story.
4. Add at least one test that exercises the variant inside a
`Transparent` recursion. That's the canary that catches
composition bugs early.
Loading