feat(layout-engine): balance columns at continuous section breaks (SD-2452)#2869
Conversation
…-2452)
Implements ECMA-376 §17.18.77 column balancing for multi-column sections.
Word produces a minimum-height balanced layout at the end of a continuous
(and, empirically, next-page) multi-column section; SuperDoc was either
leaving content stacked in the first column or, in some layouts, producing
overlapping fragments.
The pagination pipeline now balances each multi-column section's last page
at layout time:
- layoutDocument builds a block -> section map by walking blocks in
document order and tracking the current section from the most recent
sectionBreak (pm-adapter only stamps attrs.sectionIndex on sectionBreak
blocks, not on content paragraphs).
- A new balanceSectionOnPage helper performs section-scoped balancing
with its own fragment-level positioning (no Y-grouping): fragments are
ordered by (x, y) in document order and each is treated as its own
block. The previous balancePageColumns grouped fragments by Y into
"rows," which collapsed fragments from different source columns at the
same Y and produced overlap.
- calculateBalancedColumnHeight is now a proper binary search for the
minimum column height H such that greedy left-to-right fill places
every block with every column <= H. This matches Word's left-heavy
packing preference (e.g. 7 blocks / 3 cols -> 3+3+1, not 2+2+3).
- A mid-page hook at forceMidPageRegion balances the ending section on
the current page before starting the new region, and collapses both
cursors to balanceResult.maxY so the next region begins just below the
balanced columns. Sections handled mid-page are tracked in
alreadyBalancedSections so the post-layout pass doesn't double-balance.
- The prior "last page of document" heuristic is replaced with a
per-section post-layout loop that balances each multi-column section's
last page, skipping sections already handled mid-page.
Tests:
- 11 new unit/integration tests covering the 5 SD-2452 fixtures
(2-col/3-col, equal and unequal heights, continuous and next-page
breaks, multi-page sections, explicit column-break opt-out).
- 614 layout-engine tests pass, 1737 pm-adapter tests pass,
11375 super-editor tests pass.
Visual validation against Microsoft Word for all 5 fixtures:
- Test 1 (6 paras / 2 cols): 3+3 exact match
- Test 2 (5 mixed / 2 cols): 2+3 exact match
- Test 3 (7 paras / 3 cols): 3+3+1 exact match
- Test 4 (13 paras / 2 cols): 7+6 exact match, overlap gone
- Test 5 (continuous + next-page): 3+2, 3+2 exact match
dd5aff7 to
5b2335a
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
…uction (SD-2452)
When a mid-page section break reduced the column count (e.g. 2-col ->
1-col for test 4's 13-paragraph fixture followed by OVERLAP CHECK), the
mid-page hook's forced-page-break guard ran before balancing:
if (columnIndexBefore >= newColumns.count) {
state = paginator.startNewPage();
}
// ... balance ran here, on the empty new page
At the section transition, columnIndexBefore=1 (paginator was in col 1)
and newColumns.count=1, so the guard forced a new page before balancing
had a chance to reposition the ending section's fragments. Balancing
then ran on the empty new page (no-op), the paginator placed the
post-columns single-column content on the new page, and the old page's
fragments were balanced by the post-layout pass. Net effect: columns
looked correct on page 0 but OVERLAP CHECK ended up on page 1, while
Word fits everything on one page.
The guard exists to prevent new 1-col content from overwriting earlier
column content on the same page. With balancing, that risk disappears:
all ending-section fragments are repositioned within the section's own
vertical region, and the cursor moves to maxY below the balanced
columns. The new region starts safely below.
Fix: balance first. Only fall through to the forced-page-break guard
when the ending section won't be balanced (single-col -> multi-col,
explicit column break, or no section-1 fragments on the page).
Test 4 now renders on a single page, matching Word:
- 7+6 balanced columns
- OVERLAP CHECK heading at y=758 (right below columns)
- "If this overlaps..." at y=794
- Total: 1 page (was 2)
All 5 SD-2452 fixtures now match Word's pagination exactly. 614
layout-engine tests still pass.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e4265964d6
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
…cing-for-continuous-section
…D-2646) (#2930) * fix(pm-adapter): emit section break before non-paragraph nodes (SD-2646) Per ECMA-376 §17.6.17, a <w:sectPr> inside a paragraph defines the section that ENDS with that paragraph. All body children preceding it — paragraphs, tables, top-level drawings, SDTs — belong to that section. Section ranges were indexed purely by paragraph count, and section-break blocks were emitted only inside handleParagraphNode. A table that sat between two sectPr-marker paragraphs was emitted into the flow stream BEFORE the section break that declared its column config, so the layout engine laid it out under the prior section's settings. This is the root cause of IT-945 rendering a 114-row 2-col continuous table in column 0 across three pages with column 1 empty: the table was placed in the 1-col section, not the 2-col section. Fix: - Track nodeIndex over every top-level doc.content child in findParagraphsWithSectPr and SectionRange (alongside paragraphIndex, which SDT handlers still use for intra-SDT transitions). - Add maybeEmitNextSectionBreakForNode in sections/breaks.ts and call it from internal.ts's main dispatch loop BEFORE every top-level handler. Any non-paragraph node crossing a section boundary now triggers the break. - Section-model primer in pm-adapter/README.md with spec citations. Tests: 1739/1739 pass in pm-adapter (including new end-tagged.test.ts and integration test in index.test.ts asserting flow-block order). * fix(layout-engine): split dominant table at row boundary when balancing section-final page (SD-2646) The column balancer treats each fragment as an atomic block. A multi-page two-column continuous section's final page can end up with a single table fragment taller than totalSectionHeight / columnCount. The atomic-block binary search then places the whole table in one column and leaves the other empty — diverging from Word, which balances by splitting the table at a row boundary per ECMA-376 §17.18.77 ("a continuous section break balances the content of the previous section"). Fix: add splitDominantTableAtRowBoundary as a preprocessor inside balanceSectionOnPage. When the section has a single splittable table fragment larger than target, split it at the row whose cumulative height first meets or exceeds totalSectionHeight / columnCount. The two halves are inserted in place of the original; the rest of the balancer runs unchanged and naturally assigns one to each column. Also add getBalancingHeight so empty sectPr-marker paragraphs (measured lines with width=0) contribute 0 to balancing — matching Word's behavior of not rendering an empty line for such markers. This keeps both columns top-aligned on the section-final page. On IT-945: page 2 now splits 14/14 from y=96 in both columns, matching Word's top-alignment. Before this fix page 2 rendered all 28 remaining rows in col 1 with col 0 empty. Tests: strengthened existing "balances the section-ending page" test (it was passing trivially via `if (sectionFragments.length > 1)` guard). Added narrow-table multi-page regression test. 616/616 pass.
…cing-for-continuous-section
…cing-for-continuous-section
|
@tupizz please double check layout testing and the below. I see some documents that have definitely regressed.
|
|
@luccas-harbour can you pls work with Tadeu on getting this one to the finish line next week? Be extra careful around layout testing pls make sure no regressions! |
|
hey @tupizz! nothing to add right now apart from Nick's comment. I'll have another look once those are addressed. also, note that the build is failing, which caused the behavior tests to fail running. thanks! |
…2452) Address Nick's four review comments on PR #2869: 1. Section-local page geometry. The post-layout balancing pass derived contentWidth/availableHeight/margins.left from the FINAL active state, which silently rewrote earlier sections using the last section's content box. Read margins and size from each section's last page instead, so documents with mixed page setups (orientation, margins, paper size) per section keep their own metrics during balancing. 2. Document-wide column-layout fallback. When a caller passes LayoutOptions.columns directly without any sectionBreak blocks, sectionColumnsMap stays empty and the per-section loop never ran, leaving the final page stacked in column 0. Synthesize a virtual section that spans the whole document when no sectionBreak exists, preserving the pre-SD-2452 final-page balancing behavior. Guard with documentHasExplicitColumnBreak so author intent wins. 3. Blank-paragraph height preservation. The earlier `line.width === 0` heuristic for sectPr-marker paragraphs also matched ordinary blank paragraphs, collapsing their height and causing the next paragraph to overlap the empty line. Replace with an explicit `attrs.sectPrMarker` block-id set threaded through the balance APIs. 4. Table rowBoundaries shape. splitDominantTableAtRowBoundary stored regenerated boundaries using the renderer's compact serialized keys ({i,h,min,r}) instead of the contract `TableRowBoundary` shape ({index,height,minHeight,resizable}). The DOM renderer's projection then produced undefined values, breaking row-resize handles on split table fragments. Plus a robustness fix: `getFragmentHeight` now consults `measure.totalHeight` for tables when fragment.height is 0, so balancing math doesn't silently zero out tables whose layout pass allocated no height (e.g. header-less tables in degenerate test fixtures). All 653 layout-engine unit tests pass.
There was a problem hiding this comment.
hey! I saw you addressed Nick's comments so I had another look and found a few things. the biggest one is that willBalance doesn't match balanceSectionOnPage's actual skip conditions, so the page-break fallback can be skipped from an invalid column state. there's also a continuesOnNext mutation in the table split that always collapses to false, and the split target ignores content already placed in column 0.
I am also running into this issue when trying to build the project:
packages/super-editor/src/editors/v1/document-api-adapters/helpers/sections-resolver.ts:168:3 - error TS2739: Type '{ sectionIndex: number; startParagraphIndex: number; endParagraphIndex: number; sectPr: SectPrElement; margins: null; pageSize: null; orientation: null; columns: null; type: SectionType.CONTINUOUS; ... 4 more ...; vAlign: undefined; }' is missing the following properties from type 'SectionRange': startNodeIndex, endNodeIndex
168 return {
let me know what you think!
…ast-section (SD-2452) Per ECMA-376 §17.18.77 and the Linear spec for SD-2452, only continuous section breaks trigger column balancing. The previous post-layout pass balanced every multi-column section's last page regardless of break type, producing column distributions Word does not. Two cases need to be excluded: 1. Sections that end with a non-`continuous` break (`nextPage`, `evenPage`, `oddPage`). pm-adapter uses end-tagged section semantics, so `SectionBreakBlock.type` describes the break that ENDS the section. Documents like sd-1655-col-sep-3-equal-columns (3 cols, body sectPr only) and multi-column-sections.docx (default `nextPage` everywhere) were being rebalanced into 3+4+2 / 2+2 splits when Word fills column-by-column without balancing them at all. 2. The LAST section. The body sectPr is always the final section break and represents the document end, not a real mid-document break. Even when its type defaults to `continuous` (DEFAULT_BODY_SECTION_TYPE), there is no break AFTER its content to act as the balancing trigger. For single-section docs with multi-column body sectPr (sd-1655) Word does not balance, and now we don't either. Tracking: - `sectionEndBreakType: Map<sectionIndex, type>` records per-section the type of the break that closed the section (read from `block.type` on the SectionBreakBlock). - `lastSectionIdx` records the highest sectionIndex seen during the block walk; the gate skips it. - The synthesized fallback section (FALLBACK_SECTION_IDX = -1, used when callers pass `LayoutOptions.columns` without any pm-adapter section metadata) bypasses both gates so the document-wide fallback still fires for direct-API integrations. The mid-page balancing branch (`forceMidPageRegion`) is already gated correctly because it runs only inside the `block.type === 'continuous'` branch of `scheduleSectionBreakCompat`, and the section being closed mid-page can never be the last section. All 5 SD-2452 spec-test fixtures continue to balance correctly: spec-test-1: 3+3 / spec-test-2: 2+3 / spec-test-3: 3+3+1 spec-test-4: 7+7 / spec-test-5: 3+2 | 3+2 Regression docs now match Word: sd-1655-col-sep-3-equal-columns: was 3+4+2, now 7+1 (Word: 7+1) multi-column-sections: was balanced, now col-by-col (matches Word) multi_section_doc: was balanced, now col-by-col (matches Word) sd-2326-col-sep-continuous-section-break still balances 2+2 because its mid-document break is explicitly `continuous`. All 653 layout-engine unit tests pass.
…-implement-column-balancing-for-continuous-section # Conflicts: # packages/layout-engine/pm-adapter/src/types.d.ts
…lti-page (SD-2452)
The previous gate was too strict: it skipped balancing for the last
section unconditionally, which regressed the existing baseline behavior
for multi-page multi-column documents whose only section is the body
sectPr (e.g. two_column_two_page-arial 2 page 17, where Word produces a
3+2 split — confirmed against Word's PDF render).
Refined rule for the last section: balance only when the section spans
multiple pages. Empirical Word behavior:
- sd-1655-col-sep-3-equal-columns: 1 section, body sectPr, 1 page,
3 cols → Word does NOT balance (col 1 holds 6 paragraphs, col 2
holds 1, col 3 empty). Single-page → don't balance.
- layout/two_column_two_page-arial 2: 1 section, body sectPr, 17
pages, 2 cols → Word balances the last page (3+2 split).
Multi-page → balance.
- multi-column-sections / multi_section_doc: each section is a single
page, default `nextPage` between them → no balancing (already
excluded by the non-`continuous` end-break check).
- sd-2326-col-sep-continuous-section-break: explicit `continuous` mid-
document break → balance (already covered by the non-last branch).
Implementation: when sectionIdx === lastSectionIdx, count pages whose
fragments belong to that section. If the count is ≤ 1, skip balancing.
The check short-circuits at >1 to avoid scanning the full page list.
Corpus impact (vs npm@latest 1.31.1, after merging main):
- 374 docs total, 363 unchanged, 11 changed (2 unique + 9 widespread)
- The 9 widespread-only changes are all `pages[*].fragments[*].x|y`
on a single page each — the SD-2452 balancing applied to the
correct subset of multi-page multi-column sections.
- All 5 SD-2452 spec-test fixtures continue to balance correctly:
spec-test-1: 3+3 / spec-test-2: 2+3 / spec-test-3: 3+3+1
spec-test-4: 7+7 / spec-test-5: 3+2 | 3+2
All 653 layout-engine unit tests pass.
- index.ts mid-page balance: page-break fallback now triggers whenever balanceSectionOnPage returns null, not only when willBalance was false. willBalance is a coarse approval; balanceSectionOnPage has its own late skip conditions (unequal column widths, zero remaining height, shouldSkipBalancing thresholds) that can return null even after willBalance=true. Without the broader check, the new region started on the same page from a stale column index and overwrote the previous section's column content. - column-balancing.ts split target: subtract preceding-fragment height from totalSectionHeight / columnCount before walking the table rows. A 100px paragraph + 300px table in 2 cols hit target=200 and split the table at row=200 (cols 100+200 / 100, max=300); subtracting the 100 leading height gives target=150 → splits at row=100 (cols 100+100 / 200, max=200), matching the achievable balanced height. - column-balancing.ts split continuesOnNext: capture the original value BEFORE setting `table.continuesOnNext = true`. The previous ternary read the field after the mutation, always saw `true`, and the second half always inherited `false`. Now the second half correctly inherits the source table's cross-page continuation. - column-balancing.ts split rollback: splitDominantTableAtRowBoundary now returns a rollback closure. balanceSectionOnPage invokes it when shouldSkipBalancing fires post-split, so the page never carries an overlapping half table when balancing is ultimately skipped. The ordering (split-then-skip) is intentional — split rescues the single-unbreakable case that pre-split skip would otherwise reject — but with rollback the mutation no longer survives a late skip. - column-balancing.ts: remove balancePageColumns and its test block. The function had no production callers after balanceSectionOnPage became the only entry point. Its shared helper (createMeasure) is inlined into the balanceSectionOnPage tests. - super-editor sections-resolver.ts: add startNodeIndex / endNodeIndex to the synthetic SectionRange. Required after the main-merge that added these fields to SectionRange (commit 85a503c). Fixes the TS2739 build error luccas reported. All 644 layout-engine unit tests pass. super-editor build is clean.
Word draws a column separator only between columns that BOTH have content
within the region. The renderer was drawing the separator full-height
whenever `withSeparator: true` and `count > 1`, regardless of whether the
column to the right of the boundary had any fragments. This produced a
spurious vertical line on pages whose section content fits in column 0
(e.g. multi-column-sections.docx page 2 — Word shows nothing, we drew a
line top-to-bottom of the column area).
Gate each separator on fragment presence past the boundary within the
region's y range:
hasContentPastSeparator =
page.fragments.some(f => f.x >= separatorX
&& f.y >= yStart - 0.5
&& f.y < yEnd + 0.5)
Verified against Word renderings:
- multi-column-sections page 2 (col 1 only) → 0 separators ✓
- sd-1655 (3 cols, col 3 empty) → 1 separator (col 1↔2) ✓
- sd-2326 (mid-doc continuous, balanced 2 cols) → separator drawn ✓
- two_column_two_page-arial 2 page 17 (balanced) → separator drawn ✓
Tests updated to reflect the gate. Existing 15 separator tests now seed
each verified column with a stub fragment so they pin down geometry, not
the gate. 3 new tests pin down the gate behavior:
- suppresses separator when right column is empty
- draws only the separator whose right neighbor has content
- checks fragment presence within the region, not whole-page
1052 painter-dom tests pass. 644 layout-engine tests pass.
…continuous (SD-2452)
Empirical Word behavior on docs with explicit `<w:type w:val="continuous"/>`
on the body sectPr: balance any multi-column section whose content precedes
the body, even when that section's own end-break is `nextPage` (default).
The simplest reproducer is `tabs/sd-1480-two-col-tab-positions.docx`:
- 5 paragraphs ending with an inline sectPr (no `<w:type>` → default
`nextPage`).
- 1 empty paragraph followed by the body sectPr with explicit
`<w:type w:val="continuous"/>`.
- Word renders the 5 entries 3+3 across 2 columns on a single page.
- Pre-fix: our gate skipped balancing because section 0's break is
nextPage → the page rendered as 6+0.
Distinguishing explicit vs. default `continuous` requires plumbing a
`typeIsExplicit` flag from the OOXML parser through to the layout-engine:
- `extractSectionType` now returns `null` when `<w:type>` is absent,
instead of defaulting to `nextPage`. Callers apply the correct default
(paragraph sectPr → `nextPage`, body sectPr → `continuous`).
- `extractSectionData` exposes `typeIsExplicit: boolean`.
- `SectionRange.typeIsExplicit` carries the flag through analysis.
- `createSectionBreakBlock` writes it onto `attrs.typeIsExplicit`.
- `layoutDocument` reads it into `sectionTypeIsExplicit: Map<idx, bool>`.
Updated balance gate (per-section, count > 1):
Balance if:
(a) section's own end-break is `continuous` AND it is NOT the last
section, OR
(b) the doc contains any EXPLICIT continuous break (typically the body
sectPr), OR
(c) the section spans multiple pages.
Otherwise skip — covers `sd-1655-col-sep-3-equal-columns` (single section,
default body continuous, single page → Word fills col-by-col).
Test fallout: three pm-adapter tests asserted that body sectPrs without
`<w:type>` defaulted to `nextPage`. That was a leak from the old
`extractSectionType` paragraph-style default. The corrected default is
`continuous` per OOXML body-sectPr semantics. Tests updated to assert the
new behavior plus `typeIsExplicit: false`.
Verification:
- sd-1480 page 1: was 6+0, now 4+2 (Word shows 3+3 — balancing engages
correctly; the 4+2 vs 3+3 distribution gap is residual binary-search
behavior on uneven paragraph heights, separate from the gate).
- sd-1655: still col-by-col (no balancing). ✓
- multi-column-sections, multi_section_doc: still col-by-col. ✓
- spec-test-1..5: 3+3 / 2+3 / 3+3+1 / 7+7 / 3+2|3+2. ✓
- sd-2326 (mid-doc continuous): still balanced 2+2. ✓
644 layout-engine + 1802 pm-adapter unit tests pass.
The previous commit changed `extractSectionType` to return `null` when
`<w:type>` was missing, which let analysis.ts apply
`DEFAULT_BODY_SECTION_TYPE = continuous` for body sectPrs. Most fixtures
flipped from `nextPage` to `continuous`, rippling through page-break
placement, header/footer flow, and column-flow decisions across the
whole pipeline (541 of 374 corpus docs changed, 1204 visual diffs).
Surgical revert:
- `extractSectionType` returns the OOXML default (`'nextPage'`) again,
matching the pre-PR pipeline behavior. The body-sectPr type is once
more `'nextPage'` when `<w:type>` is omitted.
- A new `extractSectionTypeIsExplicit` helper returns `true` only when
`<w:type>` was actually written. `extractSectionData` exposes it as
`typeIsExplicit`.
- `SectionRange.typeIsExplicit` propagates through analysis (paragraph
sectPrs, body sectPr, fallback final, synthetic ranges).
- `createSectionBreakBlock` writes `attrs.typeIsExplicit: true` ONLY
when the flag is true. Omitting the field for the (vast majority of)
sectPrs without `<w:type>` keeps the FlowBlock attrs schema
backward-compatible with the published 1.31.1 layout snapshots.
Refined column-balance gate (layout-engine/index.ts) reads the new flag.
Balance if any of:
1. Section's own end-break is `continuous` AND not the last section
(covers spec-test-1..5, sd-2326).
2. Doc has at least one EXPLICIT continuous break AND this section's
type was NOT explicitly set to a page-forcing type. Covers
sd-1480-two-col-tab-positions section 0 (default `nextPage` but
body sectPr explicit continuous → Word balances).
3. Section spans multiple pages (covers `two_column_two_page-arial 2`
p17, body default, multi-page).
Otherwise skip — covers `sd-1655-col-sep-3-equal-columns` (single
section, default body, single page → Word fills col-by-col).
Corpus (vs npm@latest 1.31.1): 541 → 47 changed (13 unique +
34 widespread-only `attrs.typeIsExplicit` schema additions).
Browser verification:
- spec-test-1..5: 3+3 / 2+3 / 3+3+1 / 7+7 / 3+2|3+2 ✓
- sd-1655: 7+1 col-by-col ✓
- multi-column-sections / multi_section_doc: col-by-col ✓
- sd-1480: was 6+0, now 3+2 (Word: 3+3 — gate engages correctly; the
remaining 3+2 vs 3+3 gap is balancer-algorithm behavior on uneven
paragraph heights, separate from the gate).
- sd-2326: 2+2 ✓
644 layout-engine + 1802 pm-adapter unit tests pass.
…le (SD-2452)
Per ECMA-376 §17.18.77 a continuous break "balances the section it ENDS"
— i.e., the section BEFORE the break, not the section the break belongs
to. When the body sectPr itself is the explicit-continuous trigger, it
balances the section preceding the body, not the body's own content.
Bug: rule 2 ("doc has explicit continuous → balance any non-explicitly-
non-continuous section") was firing on the body section itself when the
body sectPr was the only explicit continuous in the doc. That caused
`tabs/mixed-columns-tabs tnr` p1 to render 10+9 when Word renders 14+5
(column-flow without balancing): the body sectPr is explicit-continuous
+ 2-col, but the 2-col Test list IS the body — there is no preceding
section for the body break to "balance".
Compare with `tabs/sd-1480-two-col-tab-positions`: body sectPr is also
explicit-continuous + 2-col, but the 2-col Page entries live in
section 0 (a section BEFORE the body). The body break correctly
balances section 0 — that produces 3+3 like Word.
Fix: identify the body-explicit-continuous section (last section
whose typeIsExplicit is true and whose end-break is `continuous`) and
exclude it from rule 2. Section 0 of sd-1480 still balances. Section 1
(body) of mixed-columns-tabs-tnr does not. Body sections can still
balance via rule 1 (they can't — last section can't be "not last") or
rule 3 (multi-page check, e.g. two_column_two_page-arial 2 p17).
Browser verification:
- mixed-columns-tabs-tnr: was 10+9, now 14+5 (Word: 14+5) ✓ exact match
- sd-1480: 3+2 unchanged (Word: 3+3, residual balancer-algorithm gap)
- sd-1655: 7+1 col-by-col, unchanged ✓
- multi-column-sections: col-by-col, unchanged ✓
- sd-2326: 2+2, unchanged ✓
- spec-test-1..5: 3+3 / 2+3 / 3+3+1 / 7+7 / 3+2|3+2 ✓
Corpus (vs npm@latest 1.31.1): 47 changed total (12 unique +
35 widespread-only attrs.typeIsExplicit schema-only). Previously 47
with 13 unique — mixed-columns-tabs-tnr moved from "structural diff
vs reference" to "matches reference behavior on the 2-col flow".
644 layout-engine + 1802 pm-adapter unit tests pass.
Word balances the LAST PAGE of a multi-page multi-column section only
when that section is the final/body section. Mid-doc multi-page
multi-column sections retain natural column-flow on every page,
including the last — Word doesn't rebalance the overflow remainder.
Verified:
layout/ivosass-sub p3 — section 1 is mid-doc, 2-page, 2-col,
explicit-continuous end-break. The last page has 4 overflow
fragments. Word leaves them in column 0. Pre-fix our gate's
rule 1 fired and balanced p3 to 2+2. Now mid-doc multi-page
sections skip the gate and p3 stays in col 0.
lists/saas_original p4 — same pattern: mid-doc 2-col section
overflows to last page; Word doesn't rebalance.
Multi-page LAST sections (two_column_two_page-arial 2 p17 — 17 pages,
body default continuous) still balance via rule 3, matching Word's
3+2 split on the final page.
Implementation: page-count probe runs once per section
(short-circuits at >1) and feeds both the new mid-doc skip and the
existing rule 3 multi-page allow.
Browser:
- ivosass-sub: was 2+2 on p3, now col 0 only (matches Word).
- saas_original: was 2+2 on p4, now col 0 only (matches Word).
- mixed-columns-tabs-tnr: 14+5 unchanged.
- sd-1480: 3+2 unchanged.
- sd-1655: 7+1 unchanged.
- multi-column-sections: col-by-col unchanged.
- sd-2326: 2+2 unchanged.
- spec-test-1..5: 3+3 / 2+3 / 3+3+1 / 7+7 / 3+2|3+2.
- two_column_two_page-arial p17: still balances.
Corpus (vs npm@latest 1.31.1): 9 unique structural changes (down
from 12) plus 38 widespread-only attrs.typeIsExplicit schema
additions. The 9 remaining are intentional SD-2452 differences.
644 layout-engine + 1802 pm-adapter unit tests pass.
…-implement-column-balancing-for-continuous-section
Tab leaders were missing on every line BEFORE the final <w:br/> in a paragraph that uses a right-aligned dot-leader stop. The 'last N tabs of the paragraph bind to the last N alignment stops' heuristic counted across the whole paragraph, so only the trailing tab (after the final soft line break) was bound. Earlier lines fell through to default grid stops, dropping their leaders. Two changes: 1. Scope the heuristic to per-line segments delimited by explicit <w:br/> runs. pPr/tabs apply per line, not per paragraph. 2. Strip trailing-empty <w:tab/> runs (a tab at the end of a segment with no content after it). Word emits these as authoring artifacts; if they consumed an alignment-stop slot, the meaningful tab earlier in the line would fall to a default grid stop. Mirrored in measuring/dom and layout-bridge/remeasure (the two call sites that share this heuristic). Fixes the visible bug in tabs/sd-1480-two-col-tab-positions where 'Page<br/>Page<tab/>5<tab/>' rendered with leaders only on the last line and 'Page 5' lost its leader entirely. Each line now matches Word's 'Page........N' rendering.
…2452)
The trailing-empty-tab guard from the previous commit was too
aggressive: a segment shaped like 'Label:\t' (single tab at the very
end) had its only tab stripped, falling through to greedy default
grid-stop matching. Form-field leaders ('By:_____', 'Name:_____') then
truncated to the next 0.5" grid stop instead of extending to the
right-aligned alignment stop.
Add a guard: if stripping trailing tabs would leave NO effective tabs,
treat all tabs in the segment as effective. The trailing-empty heuristic
only fires when there's at least one OTHER tab to bind.
Verified visually:
- sd-1480 'Page........N' with Page+tab+N+trailing-tab still works
- 'By:____', 'Name:____', 'Title:____' form fields now extend to right
…essions The trailing-empty-tab strip introduced in 315ab84 + 4b63d25 treated tabs at the end of a segment as authoring artifacts. That broke patterns where trailing tabs ARE meaningful and need to bind to alignment stops: - HVY-25 Queensland Land Registry block 10 has '\t\t/[text]/\t\t' with 4 tabs and 4 authored stops (2 alignment). The strip walked back through the trailing tabs, marking them artifacts. Tabs 0+1 then ate the 2 alignment stops, putting the center+end binding on the FIRST two tabs instead of the last two — corrupting the layout. - HVY-19 Commercial Lease TOC ('1.\tBUSINESS POINTS\t1') had similar reordering when paragraph layout placed the page number tab as part of a multi-tab segment. The strip can't reliably distinguish authored trailing tabs (HVY-25, form fields with multiple authored stops) from sd-1480-style artifacts (Page\t5\t with one extra trailing tab). Heuristic was too aggressive. Keep the per-line-segment scoping (the real fix that closes line 1 of sd-1480 multi-line paragraphs). Drop the trailing-strip. Sd-1480 line 2 ('Page\t5\t' segment) reverts to baseline 'Page 5 ___' behavior — which is what shipped before this branch, so no new regression there.
|
hey @luccas-harbour, addressed all six in 50dc1f7: willBalance / fallback mismatch → page-break fallback now triggers whenever balanceSectionOnPage returns null 644 layout-engine unit tests pass. super-editor build is clean. |
The SD-2447 heuristic forces the last N tabs to bind to the last N end/center/decimal stops. It was added because TOC styles often have ONLY a right-aligned dot-leader stop, and tabStops gets seeded with synthetic 0.5" defaults from origin (seedDefaultsFromZero=true). Greedy then lands on a default 0.5" grid stop instead of the alignment stop — hence the heuristic. But for paragraphs with an EXPLICIT start-aligned stop ahead of the alignment stop (TOC1 style with 'start@740, end@9360, end@10080': template_format and similar Word lease templates), greedy correctly lands on the start stop and the alignment stop downstream — no force needed. The heuristic over-fires and binds tab 0 to the right alignment stop, producing the broken render: leader BEFORE the title with the page number jammed against it. Fix: compute greedy first; only apply the heuristic when greedy would land on a 'source: default' stop. When greedy already lands on an explicit stop, use it. Mirrored in measuring/dom and remeasure. Effect: - template_format TOC: now renders '1. BUSINESS POINTS........1' matching Word and the published baseline. - HVY-25 / SD-2447 fixture / sd-1480 line 1: behavior preserved. - All test suites pass (measuring-dom 332, layout-bridge 1192, layout-engine 644).
|
hey @tupizz! can you have a look? |
…(SD-2452) Address Luccas's [P2] review comment. Rule 2 of the continuous-balancing gate previously fired whenever ANY section in the document had an explicit continuous break, allowing balancing for every multi-column section whose own type was omitted — even unrelated ones. A later single-page two-column body section with omitted <w:type> would be balanced just because an earlier section was explicit-continuous, violating sd-1655's skip-omitted-single-page rule. Per ECMA-376 §17.18.77, a continuous break balances the section it ENDS. When the body sectPr authors an explicit continuous break, the affected section is the one IMMEDIATELY preceding the body. Tighten rule 2 from a doc-wide flag to bodyExplicitContinuousIdx − 1. Verified: - sd-1480: section 0 still balances (rule 2 fires for sectionIdx 0 === bodyExplicitContinuousIdx 1 − 1). - mixed-columns-tabs-tnr: body section (sectionIdx 1) does not balance (no longer matches bodyExplicitContinuousIdx − 1 = 0). - sd-1655: not affected (no body-explicit-continuous in the doc). - Hypothetical 'mid-doc explicit-continuous + body omitted single-page 2-col': body now correctly skipped. All 644 layout-engine tests pass.
|
🎉 This PR is included in @superdoc-dev/mcp v0.3.0-next.56 The release is available on GitHub release |
|
🎉 This PR is included in vscode-ext v2.3.0-next.100 |
|
🎉 This PR is included in @superdoc-dev/react v1.2.0-next.98 The release is available on GitHub release |
|
🎉 This PR is included in superdoc-cli v0.8.0-next.73 The release is available on GitHub release |
|
🎉 This PR is included in superdoc v1.30.0-next.56 The release is available on GitHub release |
|
🎉 This PR is included in superdoc-sdk v1.8.0-next.56 |
Comparison Results PDF
SD-2452-page-by-page.pdf
Summary
Implements ECMA-376 §17.18.77 column balancing for multi-column sections. Word produces a minimum-height balanced layout at the end of a continuous (and empirically, next-page) multi-column section; SuperDoc was either leaving content stacked in the first column or, in some layouts, producing overlapping fragments.
Linear: SD-2452
What changed
layoutDocumentbuilds a block → section map by walking blocks in document order and tracking the current section from the most recentsectionBreak(pm-adapter only stampsattrs.sectionIndexon sectionBreak blocks, not on content paragraphs).balanceSectionOnPagehelper performs section-scoped balancing with its own fragment-level positioning (no Y-grouping). Fragments are ordered by(x, y)and each is treated as its own block. The previousbalancePageColumnsgrouped fragments by Y into rows, which collapsed fragments from different source columns at the same Y and produced overlap.calculateBalancedColumnHeightis a proper binary search for the minimumHsuch that greedy left-to-right fill places every block with every column ≤H. Matches Word's left-heavy packing preference (e.g. 7 blocks / 3 cols → 3+3+1, not 2+2+3).forceMidPageRegionbalances the ending section on the current page before starting the new region, and collapses both cursors tobalanceResult.maxYso the next region begins just below the balanced columns. Sections handled mid-page are tracked inalreadyBalancedSectionsso the post-layout pass doesn't double-balance.Results (Word vs SuperDoc)
Side-by-side PDF comparison available locally at
/tmp/sd-2452-fixtures/SD-2452-comparison.pdf(generated via newcompare-word-vs-superdocskill).Test plan
Demo tests
Fixtures
Plan is to upload these to the R2 corpus after the PR lands.