Skip to content

fix: balance earlier pages of multi-page 2-col continuous sections (SD-2646)#2930

Merged
harbournick merged 2 commits intotadeu/sd-2452-feature-implement-column-balancing-for-continuous-sectionfrom
tadeu/sd-2646-balance-earlier-pages-of-multi-page-2col-section
Apr 30, 2026
Merged

fix: balance earlier pages of multi-page 2-col continuous sections (SD-2646)#2930
harbournick merged 2 commits intotadeu/sd-2452-feature-implement-column-balancing-for-continuous-sectionfrom
tadeu/sd-2646-balance-earlier-pages-of-multi-page-2col-section

Conversation

@tupizz
Copy link
Copy Markdown
Contributor

@tupizz tupizz commented Apr 24, 2026

Summary

Fixes SD-2646 β€” a 2-col continuous section containing a large table rendered with every row stacked into column 0 across multiple pages, making column 1 appear empty and the table's second half appear "missing" to the user.

Built on top of #2869 (SD-2452). This PR should be reviewed and merged after SD-2452 lands.

Root cause (two linked bugs)

1. pm-adapter β€” section membership for non-paragraph nodes. Per ECMA-376 Β§17.6.17, a <w:sectPr> inside a paragraph defines the section that ENDS with that paragraph. All preceding body children belong to that section, including tables and top-level drawings. SuperDoc's section analysis only counted paragraphs and emitted section-break blocks only from handleParagraphNode, so a table between two sectPr-marker paragraphs was emitted into the flow stream BEFORE the section break β€” the layout engine then laid it out under the prior section's columns. IT-945's 114-row table was placed in the 1-col section instead of the 2-col section.

2. layout-engine β€” section-final-page balancing of a dominant table. Even with (1) fixed, balanceSectionOnPage treated each fragment as an atomic block. A single table fragment taller than totalHeight / columnCount would get assigned to one column by the binary-search balancer, leaving the other empty. Word's Β§17.18.77 behavior is to split the table at a row boundary so both columns carry roughly half the rows.

Changes

pm-adapter (packages/layout-engine/pm-adapter/):

  • sections/analysis.ts β€” findParagraphsWithSectPr now walks every top-level body child with a nodeIndex counter. SectionRange carries startNodeIndex/endNodeIndex alongside the existing paragraph indices.
  • sections/breaks.ts β€” new maybeEmitNextSectionBreakForNode. Fires exactly once when currentNodeIndex === nextSection.startNodeIndex.
  • internal.ts β€” main dispatch loop calls the hook before every top-level handler. Any new block kind added in the future gets correct section membership for free.
  • README.md β€” new "Section model" primer citing Β§17.6.17 and Β§17.18.77.

layout-engine (packages/layout-engine/layout-engine/src/column-balancing.ts):

  • splitDominantTableAtRowBoundary preprocessor inside balanceSectionOnPage. When a section has a single table fragment taller than the balanced target height, splits it at the row whose cumulative height crosses the target. Existing binary search then naturally assigns the two halves to separate columns.
  • getBalancingHeight β€” empty sectPr-marker paragraphs (all measured lines have width === 0) contribute 0 to the balance. Prevents col 0 from being offset down by ~40px of empty marker paragraphs while col 1 starts at the region top.

Evidence (IT-945, numbered-rows variant)

Row ordering across both pages is now identical to Word:

Page / Col Word SuperDoc
P1 col 0 R1–R41 R1–R43
P1 col 1 R42–R82 R44–R86
P2 col 0 R83–R99 R87–R100
P2 col 1 R100–R114 R101–R114

The 2-row-per-column count delta between Word (41) and SuperDoc (43) is a separate measuring-dom / font-metrics concern (Word's empty Aptos 12pt row β‰ˆ 19.5px, SuperDoc's β‰ˆ 18.4px). Not in scope for SD-2646; tracked as a follow-up investigation.

Test plan

  • New unit test pm-adapter/src/sections/end-tagged.test.ts β€” asserts analyzeSectionRanges returns startNodeIndex/endNodeIndex that straddle a table between two sectPr markers.
  • New integration test in pm-adapter/src/index.test.ts β€” asserts toFlowBlocks emits the 2-col sectionBreak BEFORE the table block.
  • Strengthened the misleading balances the section-ending page even when the section spans multiple pages test in layout-engine/src/index.test.ts (the old version had an if (sectionFragments.length > 1) guard making it trivially pass). Now asserts every page with β‰₯ 1 column's worth of content populates both col 0 and col 1.
  • New regression test in layout-engine/src/index.test.ts β€” narrow 114-row table in a 2-col continuous section between sectPr markers renders fragments in both columns on at least one page.
  • pnpm --filter @superdoc/pm-adapter test β†’ 1739/1739 pass
  • bun test ./packages/layout-engine/layout-engine/src/ β†’ 616/616 pass
  • Browser verified against IT-945.docx: page 1 has both columns fully populated (43 rows each), page 2 splits 14/14 at y=96 with both columns top-aligned, "This is my third section" below.

Known follow-ups (separate tickets, not in this PR)

  • Table row-height parity with Word (measuring-dom) β€” accounts for the remaining ~5% per-row height difference.
  • Β§17.18.77 continuous break edge cases: property inheritance from next section, same-page footnote β†’ promote to nextPage, nextColumn break type coverage.
  • Remove the now-redundant section-break check copies in sdt/bibliography.ts, sdt/document-index.ts, sdt/table-of-authorities.ts (dispatch-level hook supersedes them, but they're still defensive and don't double-emit).

tupizz added 2 commits April 23, 2026 21:14
Per ECMA-376 Β§17.6.17, a <w:sectPr> inside a paragraph defines the section
that ENDS with that paragraph. All body children preceding it β€” paragraphs,
tables, top-level drawings, SDTs β€” belong to that section.

Section ranges were indexed purely by paragraph count, and section-break
blocks were emitted only inside handleParagraphNode. A table that sat
between two sectPr-marker paragraphs was emitted into the flow stream
BEFORE the section break that declared its column config, so the layout
engine laid it out under the prior section's settings.

This is the root cause of IT-945 rendering a 114-row 2-col continuous
table in column 0 across three pages with column 1 empty: the table was
placed in the 1-col section, not the 2-col section.

Fix:
- Track nodeIndex over every top-level doc.content child in
  findParagraphsWithSectPr and SectionRange (alongside paragraphIndex,
  which SDT handlers still use for intra-SDT transitions).
- Add maybeEmitNextSectionBreakForNode in sections/breaks.ts and call
  it from internal.ts's main dispatch loop BEFORE every top-level
  handler. Any non-paragraph node crossing a section boundary now
  triggers the break.
- Section-model primer in pm-adapter/README.md with spec citations.

Tests: 1739/1739 pass in pm-adapter (including new end-tagged.test.ts
and integration test in index.test.ts asserting flow-block order).
…ng section-final page (SD-2646)

The column balancer treats each fragment as an atomic block. A
multi-page two-column continuous section's final page can end up with
a single table fragment taller than totalSectionHeight / columnCount.
The atomic-block binary search then places the whole table in one
column and leaves the other empty β€” diverging from Word, which
balances by splitting the table at a row boundary per ECMA-376
Β§17.18.77 ("a continuous section break balances the content of the
previous section").

Fix: add splitDominantTableAtRowBoundary as a preprocessor inside
balanceSectionOnPage. When the section has a single splittable table
fragment larger than target, split it at the row whose cumulative
height first meets or exceeds totalSectionHeight / columnCount. The
two halves are inserted in place of the original; the rest of the
balancer runs unchanged and naturally assigns one to each column.

Also add getBalancingHeight so empty sectPr-marker paragraphs
(measured lines with width=0) contribute 0 to balancing β€” matching
Word's behavior of not rendering an empty line for such markers.
This keeps both columns top-aligned on the section-final page.

On IT-945: page 2 now splits 14/14 from y=96 in both columns, matching
Word's top-alignment. Before this fix page 2 rendered all 28 remaining
rows in col 1 with col 0 empty.

Tests: strengthened existing "balances the section-ending page" test
(it was passing trivially via `if (sectionFragments.length > 1)`
guard). Added narrow-table multi-page regression test. 616/616 pass.
@linear
Copy link
Copy Markdown

linear Bot commented Apr 24, 2026

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

πŸ’‘ Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 36027006c2

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with πŸ‘.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread packages/layout-engine/layout-engine/src/column-balancing.ts
@tupizz tupizz requested a review from harbournick April 24, 2026 13:25
@harbournick harbournick merged commit 71fb404 into tadeu/sd-2452-feature-implement-column-balancing-for-continuous-section Apr 30, 2026
43 checks passed
@harbournick harbournick deleted the tadeu/sd-2646-balance-earlier-pages-of-multi-page-2col-section branch April 30, 2026 18:07
luccas-harbour pushed a commit that referenced this pull request May 5, 2026
…-2452) (#2869)

* feat(layout-engine): balance columns at continuous section breaks (SD-2452)

Implements ECMA-376 Β§17.18.77 column balancing for multi-column sections.
Word produces a minimum-height balanced layout at the end of a continuous
(and, empirically, next-page) multi-column section; SuperDoc was either
leaving content stacked in the first column or, in some layouts, producing
overlapping fragments.

The pagination pipeline now balances each multi-column section's last page
at layout time:

  - layoutDocument builds a block -> section map by walking blocks in
    document order and tracking the current section from the most recent
    sectionBreak (pm-adapter only stamps attrs.sectionIndex on sectionBreak
    blocks, not on content paragraphs).
  - A new balanceSectionOnPage helper performs section-scoped balancing
    with its own fragment-level positioning (no Y-grouping): fragments are
    ordered by (x, y) in document order and each is treated as its own
    block. The previous balancePageColumns grouped fragments by Y into
    "rows," which collapsed fragments from different source columns at the
    same Y and produced overlap.
  - calculateBalancedColumnHeight is now a proper binary search for the
    minimum column height H such that greedy left-to-right fill places
    every block with every column <= H. This matches Word's left-heavy
    packing preference (e.g. 7 blocks / 3 cols -> 3+3+1, not 2+2+3).
  - A mid-page hook at forceMidPageRegion balances the ending section on
    the current page before starting the new region, and collapses both
    cursors to balanceResult.maxY so the next region begins just below the
    balanced columns. Sections handled mid-page are tracked in
    alreadyBalancedSections so the post-layout pass doesn't double-balance.
  - The prior "last page of document" heuristic is replaced with a
    per-section post-layout loop that balances each multi-column section's
    last page, skipping sections already handled mid-page.

Tests:

  - 11 new unit/integration tests covering the 5 SD-2452 fixtures
    (2-col/3-col, equal and unequal heights, continuous and next-page
    breaks, multi-page sections, explicit column-break opt-out).
  - 614 layout-engine tests pass, 1737 pm-adapter tests pass,
    11375 super-editor tests pass.

Visual validation against Microsoft Word for all 5 fixtures:

  - Test 1 (6 paras / 2 cols):       3+3        exact match
  - Test 2 (5 mixed / 2 cols):       2+3        exact match
  - Test 3 (7 paras / 3 cols):       3+3+1      exact match
  - Test 4 (13 paras / 2 cols):      7+6        exact match, overlap gone
  - Test 5 (continuous + next-page): 3+2, 3+2   exact match

* fix(layout-engine): balance before forced page break on col-count reduction (SD-2452)

When a mid-page section break reduced the column count (e.g. 2-col ->
1-col for test 4's 13-paragraph fixture followed by OVERLAP CHECK), the
mid-page hook's forced-page-break guard ran before balancing:

  if (columnIndexBefore >= newColumns.count) {
    state = paginator.startNewPage();
  }
  // ... balance ran here, on the empty new page

At the section transition, columnIndexBefore=1 (paginator was in col 1)
and newColumns.count=1, so the guard forced a new page before balancing
had a chance to reposition the ending section's fragments. Balancing
then ran on the empty new page (no-op), the paginator placed the
post-columns single-column content on the new page, and the old page's
fragments were balanced by the post-layout pass. Net effect: columns
looked correct on page 0 but OVERLAP CHECK ended up on page 1, while
Word fits everything on one page.

The guard exists to prevent new 1-col content from overwriting earlier
column content on the same page. With balancing, that risk disappears:
all ending-section fragments are repositioned within the section's own
vertical region, and the cursor moves to maxY below the balanced
columns. The new region starts safely below.

Fix: balance first. Only fall through to the forced-page-break guard
when the ending section won't be balanced (single-col -> multi-col,
explicit column break, or no section-1 fragments on the page).

Test 4 now renders on a single page, matching Word:
  - 7+6 balanced columns
  - OVERLAP CHECK heading at y=758 (right below columns)
  - "If this overlaps..." at y=794
  - Total: 1 page (was 2)

All 5 SD-2452 fixtures now match Word's pagination exactly. 614
layout-engine tests still pass.

* fix: balance earlier pages of multi-page 2-col continuous sections (SD-2646) (#2930)

* fix(pm-adapter): emit section break before non-paragraph nodes (SD-2646)

Per ECMA-376 Β§17.6.17, a <w:sectPr> inside a paragraph defines the section
that ENDS with that paragraph. All body children preceding it β€” paragraphs,
tables, top-level drawings, SDTs β€” belong to that section.

Section ranges were indexed purely by paragraph count, and section-break
blocks were emitted only inside handleParagraphNode. A table that sat
between two sectPr-marker paragraphs was emitted into the flow stream
BEFORE the section break that declared its column config, so the layout
engine laid it out under the prior section's settings.

This is the root cause of IT-945 rendering a 114-row 2-col continuous
table in column 0 across three pages with column 1 empty: the table was
placed in the 1-col section, not the 2-col section.

Fix:
- Track nodeIndex over every top-level doc.content child in
  findParagraphsWithSectPr and SectionRange (alongside paragraphIndex,
  which SDT handlers still use for intra-SDT transitions).
- Add maybeEmitNextSectionBreakForNode in sections/breaks.ts and call
  it from internal.ts's main dispatch loop BEFORE every top-level
  handler. Any non-paragraph node crossing a section boundary now
  triggers the break.
- Section-model primer in pm-adapter/README.md with spec citations.

Tests: 1739/1739 pass in pm-adapter (including new end-tagged.test.ts
and integration test in index.test.ts asserting flow-block order).

* fix(layout-engine): split dominant table at row boundary when balancing section-final page (SD-2646)

The column balancer treats each fragment as an atomic block. A
multi-page two-column continuous section's final page can end up with
a single table fragment taller than totalSectionHeight / columnCount.
The atomic-block binary search then places the whole table in one
column and leaves the other empty β€” diverging from Word, which
balances by splitting the table at a row boundary per ECMA-376
Β§17.18.77 ("a continuous section break balances the content of the
previous section").

Fix: add splitDominantTableAtRowBoundary as a preprocessor inside
balanceSectionOnPage. When the section has a single splittable table
fragment larger than target, split it at the row whose cumulative
height first meets or exceeds totalSectionHeight / columnCount. The
two halves are inserted in place of the original; the rest of the
balancer runs unchanged and naturally assigns one to each column.

Also add getBalancingHeight so empty sectPr-marker paragraphs
(measured lines with width=0) contribute 0 to balancing β€” matching
Word's behavior of not rendering an empty line for such markers.
This keeps both columns top-aligned on the section-final page.

On IT-945: page 2 now splits 14/14 from y=96 in both columns, matching
Word's top-alignment. Before this fix page 2 rendered all 28 remaining
rows in col 1 with col 0 empty.

Tests: strengthened existing "balances the section-ending page" test
(it was passing trivially via `if (sectionFragments.length > 1)`
guard). Added narrow-table multi-page regression test. 616/616 pass.

* chore: update lock

* fix(layout-engine): address review feedback for column balancing (SD-2452)

Address Nick's four review comments on PR #2869:

1. Section-local page geometry. The post-layout balancing pass derived
   contentWidth/availableHeight/margins.left from the FINAL active state,
   which silently rewrote earlier sections using the last section's content
   box. Read margins and size from each section's last page instead, so
   documents with mixed page setups (orientation, margins, paper size) per
   section keep their own metrics during balancing.

2. Document-wide column-layout fallback. When a caller passes
   LayoutOptions.columns directly without any sectionBreak blocks,
   sectionColumnsMap stays empty and the per-section loop never ran,
   leaving the final page stacked in column 0. Synthesize a virtual
   section that spans the whole document when no sectionBreak exists,
   preserving the pre-SD-2452 final-page balancing behavior. Guard with
   documentHasExplicitColumnBreak so author intent wins.

3. Blank-paragraph height preservation. The earlier `line.width === 0`
   heuristic for sectPr-marker paragraphs also matched ordinary blank
   paragraphs, collapsing their height and causing the next paragraph to
   overlap the empty line. Replace with an explicit
   `attrs.sectPrMarker` block-id set threaded through the balance APIs.

4. Table rowBoundaries shape. splitDominantTableAtRowBoundary stored
   regenerated boundaries using the renderer's compact serialized keys
   ({i,h,min,r}) instead of the contract `TableRowBoundary` shape
   ({index,height,minHeight,resizable}). The DOM renderer's projection
   then produced undefined values, breaking row-resize handles on split
   table fragments.

Plus a robustness fix: `getFragmentHeight` now consults
`measure.totalHeight` for tables when fragment.height is 0, so balancing
math doesn't silently zero out tables whose layout pass allocated no
height (e.g. header-less tables in degenerate test fixtures).

All 653 layout-engine unit tests pass.

* fix(layout-engine): gate column balancing on continuous break + not-last-section (SD-2452)

Per ECMA-376 Β§17.18.77 and the Linear spec for SD-2452, only continuous
section breaks trigger column balancing. The previous post-layout pass
balanced every multi-column section's last page regardless of break type,
producing column distributions Word does not.

Two cases need to be excluded:

1. Sections that end with a non-`continuous` break (`nextPage`, `evenPage`,
   `oddPage`). pm-adapter uses end-tagged section semantics, so
   `SectionBreakBlock.type` describes the break that ENDS the section.
   Documents like sd-1655-col-sep-3-equal-columns (3 cols, body sectPr
   only) and multi-column-sections.docx (default `nextPage` everywhere)
   were being rebalanced into 3+4+2 / 2+2 splits when Word fills
   column-by-column without balancing them at all.

2. The LAST section. The body sectPr is always the final section break
   and represents the document end, not a real mid-document break. Even
   when its type defaults to `continuous` (DEFAULT_BODY_SECTION_TYPE),
   there is no break AFTER its content to act as the balancing trigger.
   For single-section docs with multi-column body sectPr (sd-1655) Word
   does not balance, and now we don't either.

Tracking:
- `sectionEndBreakType: Map<sectionIndex, type>` records per-section the
  type of the break that closed the section (read from `block.type` on
  the SectionBreakBlock).
- `lastSectionIdx` records the highest sectionIndex seen during the
  block walk; the gate skips it.
- The synthesized fallback section (FALLBACK_SECTION_IDX = -1, used when
  callers pass `LayoutOptions.columns` without any pm-adapter section
  metadata) bypasses both gates so the document-wide fallback still
  fires for direct-API integrations.

The mid-page balancing branch (`forceMidPageRegion`) is already gated
correctly because it runs only inside the `block.type === 'continuous'`
branch of `scheduleSectionBreakCompat`, and the section being closed
mid-page can never be the last section.

All 5 SD-2452 spec-test fixtures continue to balance correctly:
  spec-test-1: 3+3 / spec-test-2: 2+3 / spec-test-3: 3+3+1
  spec-test-4: 7+7 / spec-test-5: 3+2 | 3+2

Regression docs now match Word:
  sd-1655-col-sep-3-equal-columns: was 3+4+2, now 7+1 (Word: 7+1)
  multi-column-sections: was balanced, now col-by-col (matches Word)
  multi_section_doc: was balanced, now col-by-col (matches Word)

sd-2326-col-sep-continuous-section-break still balances 2+2 because
its mid-document break is explicitly `continuous`.

All 653 layout-engine unit tests pass.

* fix(layout-engine): refine balance gate β€” last section balances if multi-page (SD-2452)

The previous gate was too strict: it skipped balancing for the last
section unconditionally, which regressed the existing baseline behavior
for multi-page multi-column documents whose only section is the body
sectPr (e.g. two_column_two_page-arial 2 page 17, where Word produces a
3+2 split β€” confirmed against Word's PDF render).

Refined rule for the last section: balance only when the section spans
multiple pages. Empirical Word behavior:

  - sd-1655-col-sep-3-equal-columns: 1 section, body sectPr, 1 page,
    3 cols β†’ Word does NOT balance (col 1 holds 6 paragraphs, col 2
    holds 1, col 3 empty). Single-page β†’ don't balance.
  - layout/two_column_two_page-arial 2: 1 section, body sectPr, 17
    pages, 2 cols β†’ Word balances the last page (3+2 split).
    Multi-page β†’ balance.
  - multi-column-sections / multi_section_doc: each section is a single
    page, default `nextPage` between them β†’ no balancing (already
    excluded by the non-`continuous` end-break check).
  - sd-2326-col-sep-continuous-section-break: explicit `continuous` mid-
    document break β†’ balance (already covered by the non-last branch).

Implementation: when sectionIdx === lastSectionIdx, count pages whose
fragments belong to that section. If the count is ≀ 1, skip balancing.
The check short-circuits at >1 to avoid scanning the full page list.

Corpus impact (vs npm@latest 1.31.1, after merging main):
  - 374 docs total, 363 unchanged, 11 changed (2 unique + 9 widespread)
  - The 9 widespread-only changes are all `pages[*].fragments[*].x|y`
    on a single page each β€” the SD-2452 balancing applied to the
    correct subset of multi-page multi-column sections.
  - All 5 SD-2452 spec-test fixtures continue to balance correctly:
    spec-test-1: 3+3 / spec-test-2: 2+3 / spec-test-3: 3+3+1
    spec-test-4: 7+7 / spec-test-5: 3+2 | 3+2

All 653 layout-engine unit tests pass.

* fix(layout-engine): address luccas review comments (SD-2452)

- index.ts mid-page balance: page-break fallback now triggers whenever
  balanceSectionOnPage returns null, not only when willBalance was false.
  willBalance is a coarse approval; balanceSectionOnPage has its own
  late skip conditions (unequal column widths, zero remaining height,
  shouldSkipBalancing thresholds) that can return null even after
  willBalance=true. Without the broader check, the new region started
  on the same page from a stale column index and overwrote the previous
  section's column content.

- column-balancing.ts split target: subtract preceding-fragment height
  from totalSectionHeight / columnCount before walking the table rows.
  A 100px paragraph + 300px table in 2 cols hit target=200 and split
  the table at row=200 (cols 100+200 / 100, max=300); subtracting the
  100 leading height gives target=150 β†’ splits at row=100 (cols 100+100
  / 200, max=200), matching the achievable balanced height.

- column-balancing.ts split continuesOnNext: capture the original value
  BEFORE setting `table.continuesOnNext = true`. The previous ternary
  read the field after the mutation, always saw `true`, and the second
  half always inherited `false`. Now the second half correctly inherits
  the source table's cross-page continuation.

- column-balancing.ts split rollback: splitDominantTableAtRowBoundary
  now returns a rollback closure. balanceSectionOnPage invokes it when
  shouldSkipBalancing fires post-split, so the page never carries an
  overlapping half table when balancing is ultimately skipped. The
  ordering (split-then-skip) is intentional β€” split rescues the
  single-unbreakable case that pre-split skip would otherwise reject β€”
  but with rollback the mutation no longer survives a late skip.

- column-balancing.ts: remove balancePageColumns and its test block.
  The function had no production callers after balanceSectionOnPage
  became the only entry point. Its shared helper (createMeasure) is
  inlined into the balanceSectionOnPage tests.

- super-editor sections-resolver.ts: add startNodeIndex / endNodeIndex
  to the synthetic SectionRange. Required after the main-merge that
  added these fields to SectionRange (commit 85a503c). Fixes the
  TS2739 build error luccas reported.

All 644 layout-engine unit tests pass. super-editor build is clean.

* fix(painter): suppress column separator over empty column (SD-2452)

Word draws a column separator only between columns that BOTH have content
within the region. The renderer was drawing the separator full-height
whenever `withSeparator: true` and `count > 1`, regardless of whether the
column to the right of the boundary had any fragments. This produced a
spurious vertical line on pages whose section content fits in column 0
(e.g. multi-column-sections.docx page 2 β€” Word shows nothing, we drew a
line top-to-bottom of the column area).

Gate each separator on fragment presence past the boundary within the
region's y range:

  hasContentPastSeparator =
    page.fragments.some(f => f.x >= separatorX
                          && f.y >= yStart - 0.5
                          && f.y < yEnd + 0.5)

Verified against Word renderings:
  - multi-column-sections page 2 (col 1 only)         β†’ 0 separators βœ“
  - sd-1655 (3 cols, col 3 empty)                     β†’ 1 separator (col 1↔2) βœ“
  - sd-2326 (mid-doc continuous, balanced 2 cols)     β†’ separator drawn βœ“
  - two_column_two_page-arial 2 page 17 (balanced)    β†’ separator drawn βœ“

Tests updated to reflect the gate. Existing 15 separator tests now seed
each verified column with a stub fragment so they pin down geometry, not
the gate. 3 new tests pin down the gate behavior:

  - suppresses separator when right column is empty
  - draws only the separator whose right neighbor has content
  - checks fragment presence within the region, not whole-page

1052 painter-dom tests pass. 644 layout-engine tests pass.

* fix(layout-engine): balance multi-col sections when doc has explicit continuous (SD-2452)

Empirical Word behavior on docs with explicit `<w:type w:val="continuous"/>`
on the body sectPr: balance any multi-column section whose content precedes
the body, even when that section's own end-break is `nextPage` (default).

The simplest reproducer is `tabs/sd-1480-two-col-tab-positions.docx`:
  - 5 paragraphs ending with an inline sectPr (no `<w:type>` β†’ default
    `nextPage`).
  - 1 empty paragraph followed by the body sectPr with explicit
    `<w:type w:val="continuous"/>`.
  - Word renders the 5 entries 3+3 across 2 columns on a single page.
  - Pre-fix: our gate skipped balancing because section 0's break is
    nextPage β†’ the page rendered as 6+0.

Distinguishing explicit vs. default `continuous` requires plumbing a
`typeIsExplicit` flag from the OOXML parser through to the layout-engine:

  - `extractSectionType` now returns `null` when `<w:type>` is absent,
    instead of defaulting to `nextPage`. Callers apply the correct default
    (paragraph sectPr β†’ `nextPage`, body sectPr β†’ `continuous`).
  - `extractSectionData` exposes `typeIsExplicit: boolean`.
  - `SectionRange.typeIsExplicit` carries the flag through analysis.
  - `createSectionBreakBlock` writes it onto `attrs.typeIsExplicit`.
  - `layoutDocument` reads it into `sectionTypeIsExplicit: Map<idx, bool>`.

Updated balance gate (per-section, count > 1):

  Balance if:
    (a) section's own end-break is `continuous` AND it is NOT the last
        section, OR
    (b) the doc contains any EXPLICIT continuous break (typically the body
        sectPr), OR
    (c) the section spans multiple pages.

Otherwise skip β€” covers `sd-1655-col-sep-3-equal-columns` (single section,
default body continuous, single page β†’ Word fills col-by-col).

Test fallout: three pm-adapter tests asserted that body sectPrs without
`<w:type>` defaulted to `nextPage`. That was a leak from the old
`extractSectionType` paragraph-style default. The corrected default is
`continuous` per OOXML body-sectPr semantics. Tests updated to assert the
new behavior plus `typeIsExplicit: false`.

Verification:
  - sd-1480 page 1: was 6+0, now 4+2 (Word shows 3+3 β€” balancing engages
    correctly; the 4+2 vs 3+3 distribution gap is residual binary-search
    behavior on uneven paragraph heights, separate from the gate).
  - sd-1655: still col-by-col (no balancing). βœ“
  - multi-column-sections, multi_section_doc: still col-by-col. βœ“
  - spec-test-1..5: 3+3 / 2+3 / 3+3+1 / 7+7 / 3+2|3+2. βœ“
  - sd-2326 (mid-doc continuous): still balanced 2+2. βœ“

644 layout-engine + 1802 pm-adapter unit tests pass.

* fix(pm-adapter): surface typeIsExplicit only when authored (SD-2452)

The previous commit changed `extractSectionType` to return `null` when
`<w:type>` was missing, which let analysis.ts apply
`DEFAULT_BODY_SECTION_TYPE = continuous` for body sectPrs. Most fixtures
flipped from `nextPage` to `continuous`, rippling through page-break
placement, header/footer flow, and column-flow decisions across the
whole pipeline (541 of 374 corpus docs changed, 1204 visual diffs).

Surgical revert:

  - `extractSectionType` returns the OOXML default (`'nextPage'`) again,
    matching the pre-PR pipeline behavior. The body-sectPr type is once
    more `'nextPage'` when `<w:type>` is omitted.
  - A new `extractSectionTypeIsExplicit` helper returns `true` only when
    `<w:type>` was actually written. `extractSectionData` exposes it as
    `typeIsExplicit`.
  - `SectionRange.typeIsExplicit` propagates through analysis (paragraph
    sectPrs, body sectPr, fallback final, synthetic ranges).
  - `createSectionBreakBlock` writes `attrs.typeIsExplicit: true` ONLY
    when the flag is true. Omitting the field for the (vast majority of)
    sectPrs without `<w:type>` keeps the FlowBlock attrs schema
    backward-compatible with the published 1.31.1 layout snapshots.

Refined column-balance gate (layout-engine/index.ts) reads the new flag.
Balance if any of:
  1. Section's own end-break is `continuous` AND not the last section
     (covers spec-test-1..5, sd-2326).
  2. Doc has at least one EXPLICIT continuous break AND this section's
     type was NOT explicitly set to a page-forcing type. Covers
     sd-1480-two-col-tab-positions section 0 (default `nextPage` but
     body sectPr explicit continuous β†’ Word balances).
  3. Section spans multiple pages (covers `two_column_two_page-arial 2`
     p17, body default, multi-page).
Otherwise skip β€” covers `sd-1655-col-sep-3-equal-columns` (single
section, default body, single page β†’ Word fills col-by-col).

Corpus (vs npm@latest 1.31.1): 541 β†’ 47 changed (13 unique +
34 widespread-only `attrs.typeIsExplicit` schema additions).

Browser verification:
  - spec-test-1..5: 3+3 / 2+3 / 3+3+1 / 7+7 / 3+2|3+2 βœ“
  - sd-1655: 7+1 col-by-col βœ“
  - multi-column-sections / multi_section_doc: col-by-col βœ“
  - sd-1480: was 6+0, now 3+2 (Word: 3+3 β€” gate engages correctly; the
    remaining 3+2 vs 3+3 gap is balancer-algorithm behavior on uneven
    paragraph heights, separate from the gate).
  - sd-2326: 2+2 βœ“

644 layout-engine + 1802 pm-adapter unit tests pass.

* fix(layout-engine): exclude body-explicit-continuous from doc-wide rule (SD-2452)

Per ECMA-376 Β§17.18.77 a continuous break "balances the section it ENDS"
β€” i.e., the section BEFORE the break, not the section the break belongs
to. When the body sectPr itself is the explicit-continuous trigger, it
balances the section preceding the body, not the body's own content.

Bug: rule 2 ("doc has explicit continuous β†’ balance any non-explicitly-
non-continuous section") was firing on the body section itself when the
body sectPr was the only explicit continuous in the doc. That caused
`tabs/mixed-columns-tabs tnr` p1 to render 10+9 when Word renders 14+5
(column-flow without balancing): the body sectPr is explicit-continuous
+ 2-col, but the 2-col Test list IS the body β€” there is no preceding
section for the body break to "balance".

Compare with `tabs/sd-1480-two-col-tab-positions`: body sectPr is also
explicit-continuous + 2-col, but the 2-col Page entries live in
section 0 (a section BEFORE the body). The body break correctly
balances section 0 β€” that produces 3+3 like Word.

Fix: identify the body-explicit-continuous section (last section
whose typeIsExplicit is true and whose end-break is `continuous`) and
exclude it from rule 2. Section 0 of sd-1480 still balances. Section 1
(body) of mixed-columns-tabs-tnr does not. Body sections can still
balance via rule 1 (they can't β€” last section can't be "not last") or
rule 3 (multi-page check, e.g. two_column_two_page-arial 2 p17).

Browser verification:
  - mixed-columns-tabs-tnr: was 10+9, now 14+5 (Word: 14+5) βœ“ exact match
  - sd-1480: 3+2 unchanged (Word: 3+3, residual balancer-algorithm gap)
  - sd-1655: 7+1 col-by-col, unchanged βœ“
  - multi-column-sections: col-by-col, unchanged βœ“
  - sd-2326: 2+2, unchanged βœ“
  - spec-test-1..5: 3+3 / 2+3 / 3+3+1 / 7+7 / 3+2|3+2 βœ“

Corpus (vs npm@latest 1.31.1): 47 changed total (12 unique +
35 widespread-only attrs.typeIsExplicit schema-only). Previously 47
with 13 unique β€” mixed-columns-tabs-tnr moved from "structural diff
vs reference" to "matches reference behavior on the 2-col flow".

644 layout-engine + 1802 pm-adapter unit tests pass.

* fix(layout-engine): skip mid-doc multi-page balance (SD-2452)

Word balances the LAST PAGE of a multi-page multi-column section only
when that section is the final/body section. Mid-doc multi-page
multi-column sections retain natural column-flow on every page,
including the last β€” Word doesn't rebalance the overflow remainder.

Verified:
  layout/ivosass-sub p3 β€” section 1 is mid-doc, 2-page, 2-col,
    explicit-continuous end-break. The last page has 4 overflow
    fragments. Word leaves them in column 0. Pre-fix our gate's
    rule 1 fired and balanced p3 to 2+2. Now mid-doc multi-page
    sections skip the gate and p3 stays in col 0.
  lists/saas_original p4 β€” same pattern: mid-doc 2-col section
    overflows to last page; Word doesn't rebalance.

Multi-page LAST sections (two_column_two_page-arial 2 p17 β€” 17 pages,
body default continuous) still balance via rule 3, matching Word's
3+2 split on the final page.

Implementation: page-count probe runs once per section
(short-circuits at >1) and feeds both the new mid-doc skip and the
existing rule 3 multi-page allow.

Browser:
  - ivosass-sub: was 2+2 on p3, now col 0 only (matches Word).
  - saas_original: was 2+2 on p4, now col 0 only (matches Word).
  - mixed-columns-tabs-tnr: 14+5 unchanged.
  - sd-1480: 3+2 unchanged.
  - sd-1655: 7+1 unchanged.
  - multi-column-sections: col-by-col unchanged.
  - sd-2326: 2+2 unchanged.
  - spec-test-1..5: 3+3 / 2+3 / 3+3+1 / 7+7 / 3+2|3+2.
  - two_column_two_page-arial p17: still balances.

Corpus (vs npm@latest 1.31.1): 9 unique structural changes (down
from 12) plus 38 widespread-only attrs.typeIsExplicit schema
additions. The 9 remaining are intentional SD-2452 differences.

644 layout-engine + 1802 pm-adapter unit tests pass.

* fix(measuring): scope tab alignment heuristic per line segment (SD-1480)

Tab leaders were missing on every line BEFORE the final <w:br/> in a
paragraph that uses a right-aligned dot-leader stop. The 'last N tabs of
the paragraph bind to the last N alignment stops' heuristic counted
across the whole paragraph, so only the trailing tab (after the final
soft line break) was bound. Earlier lines fell through to default grid
stops, dropping their leaders.

Two changes:

1. Scope the heuristic to per-line segments delimited by explicit
   <w:br/> runs. pPr/tabs apply per line, not per paragraph.

2. Strip trailing-empty <w:tab/> runs (a tab at the end of a segment
   with no content after it). Word emits these as authoring artifacts;
   if they consumed an alignment-stop slot, the meaningful tab earlier
   in the line would fall to a default grid stop.

Mirrored in measuring/dom and layout-bridge/remeasure (the two call
sites that share this heuristic).

Fixes the visible bug in tabs/sd-1480-two-col-tab-positions where
'Page<br/>Page<tab/>5<tab/>' rendered with leaders only on the last
line and 'Page  5' lost its leader entirely. Each line now matches
Word's 'Page........N' rendering.

* fix(measuring): preserve lone trailing tab as the meaningful tab (SD-2452)

The trailing-empty-tab guard from the previous commit was too
aggressive: a segment shaped like 'Label:\t' (single tab at the very
end) had its only tab stripped, falling through to greedy default
grid-stop matching. Form-field leaders ('By:_____', 'Name:_____') then
truncated to the next 0.5" grid stop instead of extending to the
right-aligned alignment stop.

Add a guard: if stripping trailing tabs would leave NO effective tabs,
treat all tabs in the segment as effective. The trailing-empty heuristic
only fires when there's at least one OTHER tab to bind.

Verified visually:
- sd-1480 'Page........N' with Page+tab+N+trailing-tab still works
- 'By:____', 'Name:____', 'Title:____' form fields now extend to right

* fix(measuring): revert trailing-empty tab strip β€” false-positive regressions

The trailing-empty-tab strip introduced in 315ab84 + 4b63d25 treated
tabs at the end of a segment as authoring artifacts. That broke patterns
where trailing tabs ARE meaningful and need to bind to alignment stops:

- HVY-25 Queensland Land Registry block 10 has '\t\t/[text]/\t\t'
  with 4 tabs and 4 authored stops (2 alignment). The strip walked back
  through the trailing tabs, marking them artifacts. Tabs 0+1 then ate
  the 2 alignment stops, putting the center+end binding on the FIRST
  two tabs instead of the last two β€” corrupting the layout.

- HVY-19 Commercial Lease TOC ('1.\tBUSINESS POINTS\t1') had similar
  reordering when paragraph layout placed the page number tab as part
  of a multi-tab segment.

The strip can't reliably distinguish authored trailing tabs (HVY-25,
form fields with multiple authored stops) from sd-1480-style artifacts
(Page\t5\t with one extra trailing tab). Heuristic was too aggressive.

Keep the per-line-segment scoping (the real fix that closes line 1 of
sd-1480 multi-line paragraphs). Drop the trailing-strip. Sd-1480 line 2
('Page\t5\t' segment) reverts to baseline 'Page  5  ___' behavior β€”
which is what shipped before this branch, so no new regression there.

* fix(measuring): gate SD-2447 alignment heuristic on default stops only

The SD-2447 heuristic forces the last N tabs to bind to the last N
end/center/decimal stops. It was added because TOC styles often have
ONLY a right-aligned dot-leader stop, and tabStops gets seeded with
synthetic 0.5" defaults from origin (seedDefaultsFromZero=true).
Greedy then lands on a default 0.5" grid stop instead of the alignment
stop β€” hence the heuristic.

But for paragraphs with an EXPLICIT start-aligned stop ahead of the
alignment stop (TOC1 style with 'start@740, end@9360, end@10080':
template_format and similar Word lease templates), greedy correctly
lands on the start stop and the alignment stop downstream β€” no force
needed. The heuristic over-fires and binds tab 0 to the right
alignment stop, producing the broken render: leader BEFORE the title
with the page number jammed against it.

Fix: compute greedy first; only apply the heuristic when greedy would
land on a 'source: default' stop. When greedy already lands on an
explicit stop, use it. Mirrored in measuring/dom and remeasure.

Effect:
- template_format TOC: now renders '1.    BUSINESS POINTS........1'
  matching Word and the published baseline.
- HVY-25 / SD-2447 fixture / sd-1480 line 1: behavior preserved.
- All test suites pass (measuring-dom 332, layout-bridge 1192,
  layout-engine 644).

* fix(layout-engine): scope explicit-continuous rule to ending section (SD-2452)

Address Luccas's [P2] review comment. Rule 2 of the continuous-balancing
gate previously fired whenever ANY section in the document had an
explicit continuous break, allowing balancing for every multi-column
section whose own type was omitted β€” even unrelated ones. A later
single-page two-column body section with omitted <w:type> would be
balanced just because an earlier section was explicit-continuous,
violating sd-1655's skip-omitted-single-page rule.

Per ECMA-376 Β§17.18.77, a continuous break balances the section it
ENDS. When the body sectPr authors an explicit continuous break, the
affected section is the one IMMEDIATELY preceding the body. Tighten
rule 2 from a doc-wide flag to bodyExplicitContinuousIdx βˆ’ 1.

Verified:
- sd-1480: section 0 still balances (rule 2 fires for sectionIdx 0 ===
  bodyExplicitContinuousIdx 1 βˆ’ 1).
- mixed-columns-tabs-tnr: body section (sectionIdx 1) does not balance
  (no longer matches bodyExplicitContinuousIdx βˆ’ 1 = 0).
- sd-1655: not affected (no body-explicit-continuous in the doc).
- Hypothetical 'mid-doc explicit-continuous + body omitted single-page
  2-col': body now correctly skipped.

All 644 layout-engine tests pass.

---------

Co-authored-by: Nick Bernal <nick@superdoc.dev>
Co-authored-by: Nick Bernal <117235294+harbournick@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants