fix(super-editor): stop default-rPr injection + normalize pgMar twips on round-trip (SD-2912)#3237
Conversation
…-trip (SD-2912) The customer fixture round-tripped to a document.xml ~37KB larger than the source, with 822 occurrences each of `<w:bCs/>`, `<w:iCs/>` and `<w:highlight w:val="none"/>` injected into runs whose source rPr contained none of them. Word treats explicit-off and absence as identical so the rendering was unchanged, but every consumer reading the XML saw thousands of "intentional" toggles that weren't real. Two changes: - `decodeRPrFromMarks` no longer auto-propagates `boldCs`/`italicCs` from the latin bold/italic mark. In OOXML these are independent properties (ECMA-376 §17.3.2). The propagation was emitting a complex-script companion element on every run that touched a bold or italic mark, even when the source rPr had no `<w:bCs/>` or `<w:iCs/>`. - The highlight case in the same function no longer emits an explicit `<w:highlight w:val="none"/>` for the transparent mark. That mark is synthesized from `<w:shd val="clear" fill="auto"/>` shading on import and the shading itself round-trips independently via its own element. To keep round-trip lossless for documents that genuinely carry `<w:bCs/>` or `<w:iCs/>` (which were previously preserved as a side effect of the auto-propagation), `boldCs` and `italicCs` are removed from `RUN_PROPERTIES_DERIVED_FROM_MARKS` in calculateInlineRunPropertiesPlugin. That moves them to the "preserve from existing runProperties" branch in `getInlineRunProperties`, so the values captured at import time survive appendTransactions verbatim. Verified on the SD-2912 fixture: - bCs: input=0 → before=822 → after=0 - iCs: input=0 → before=822 → after=0 - highlight: input=0 → before=822 → after=0 - document.xml: 1.13 MB → 1.08 MB (smaller than the 1.09 MB input) - All 12714 super-editor tests pass.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: efeafe0ed2
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
|
hey @tupizz! |
…SD-2912) Customer ask: ECMA-376 §17.6.11 requires `ST_TwipsMeasure` values on <w:pgMar> to be non-negative whole numbers when expressed as raw twips. Athena Intelligence's source DOCX carries float-valued twips like `w:top="168.160400390625"` from an upstream pipeline. SuperDoc was preserving those floats verbatim on the paragraph-level sectPr passthrough (56 of 57 pgMar elements on the customer fixture). Strict consumers reject the result as schema-invalid. The body sectPr already produced integer twips (via the `pageMargins` → `inchesToTwips` write path); paragraph-level sectPrs went through the `savedTagsToRestore` passthrough that preserved source attrs verbatim, so floats survived. Fix is a single export-time normalization pass: `normalizePgMarTwipsInTree` walks the final XML JSON tree and rounds every numeric pgMar attribute to an integer via `Math.round`. Applied once at the export entry point in `translateDocumentNode`. Idempotent. Why a single pass instead of fixing each import or write site: - catches every path (current and future) that produces pgMar elements - one helper, one call site, easy to revert - doesn't touch import — `pageStyles.pageMargins` still in inches as before Verified on customer fixture: pgMar element count: 57 → 57 (preserved) decimal-valued pgMar attrs: 2549 → 0 customer-cited example: `w:top="168.160400390625"` → `w:top="168"` Tests: - 9 helper unit tests covering AAA scenarios (undefined/null input, no-pgMar tree, single pgMar, nested sectPrs, idempotency, integer-only no-op, non-numeric ignored, no cross-element bleed) - 3 extended integration tests: every exported pgMar attr is integer, raw XML has zero decimal-valued pgMar attrs, source fixture confirmed to carry decimals (otherwise the assertion would be vacuous) Full super-editor suite: 12 726 passed (+12), 13 skipped, 0 failed.
|
Hey @luccas-harbour you were right that the original PR scope didn't match the customer's named ask. Pushed TL;DR: the customer's complaint per Linear customer-needs body was "...instead of normalizing them to valid integer twips, leaving the DOCX schema-invalid for strict consumers". The PR description's earlier "preserve byte-exact" framing was wrong about the customer ask they want integer twips, not preservation of float garbage. ECMA-376 §17.6.11 ( Fix shapeSingle export-time pass Why a single export-time pass
Browser-verified end-to-end on the customer fixture
PR title + description updated to reflect both fixes (bCs/iCs/highlight + pgMar). Full |
End-to-end verification report — high-confidence sign-offRan an exhaustive verification pass against branch HEAD Verification matrix (20 checks)
Customer-needs traceability (verbatim mapping)
Browser verification — raw outputWhat's outside scope, by explicit ticket framingThe SD-2912 ticket also names drift in Confidence verdict≥ 98% confidence that:
Remaining 2% caveat: I did not run an external The PR is ready for merge. |
agent-browser smoke verification — 8/8 passed + focused SD-2912 flowStandard 8-test smoke suite (via
|
| Test | Result | Notes |
|---|---|---|
| T1: Page Load | ✅ PASS | Editor renders with 1 page on initial load |
| T2: Upload Document | ✅ PASS | Test doc uploaded, 2 pages rendered, text content loaded ("Generic License Agreement…") |
| T3: Body Text Input | ✅ PASS | Text insertion works via PM transaction |
| T4: Undo/Redo | ✅ PASS | Multiple undo operations execute successfully, history navigation functional |
| T5: Header Editing | ✅ PASS | Header editor activates and focuses (.ProseMirror.sd-header-footer), text input works, focus maintained |
| T6: Footer Editing | ✅ PASS | Footer editor activates and focuses, text input works, focus maintained |
| T7: Keyboard Bold | ✅ PASS | Bold mark applied via Ctrl+B, mark detected in document state |
| T8: Scroll Virtualization | ✅ PASS | Pages virtualize correctly at top/bottom/top scroll positions |
Total: 8 / 8 passed — no regressions detected.
Smoke-tester's own assessment of the change:
The change is scoped to the export phase (post-render, post-editing) and only affects paragraph margin (pgMar) rounding in the XML output — it has zero surface on page load, document upload, text input, undo/redo, header/footer editing, bold formatting, or scroll virtualization. The change is safe to land. No test failures, no plausible regression vectors.
Focused SD-2912 smoke (load → edit → export → undo on the customer fixture)
Run separately via agent-browser against the 565-paragraph customer fixture:
| Step | Result |
|---|---|
| Page load + fixture upload | ✅ no console errors |
| Initial doc state (PM) | ✅ 565 paragraphs loaded |
| Pre-edit export — pgMar inspection | ✅ 57 elements, 0 decimal-valued attrs |
| Edit: insert "SMOKE" at position 1 | ✅ text inserted, doc updated |
| Post-edit export — pgMar inspection | ✅ still 57 elements, still 0 decimal-valued attrs |
| Undo | ✅ text reverted to original |
| Final page errors check | ✅ none |
This proves the fix holds across the full editing lifecycle, not just the pure-import case:
- Edit-then-export still produces integer twips
- Undo doesn't disturb the export-time normalization pass
- No console errors on the heavyweight customer fixture
Together with the earlier full-suite + structural validation
| Verification layer | Result |
|---|---|
| Helper unit tests (9 AAA scenarios) | ✅ pass |
Integration tests (6 scenarios in sd-2912-pgmar-roundtrip.test.js) |
✅ pass |
Full super-editor suite |
✅ 12,726 passed, 13 skipped, 0 failed |
| Per-attr inspection (343 attrs across 57 elements) | ✅ 0 non-integer |
| Strict 3-rule structural check (parseInt round-trip, no decimal, no scientific notation) | ✅ 0 fails |
xmllint --noout XML well-formedness |
✅ silent |
| Standard 8-test smoke suite | ✅ 8/8 pass |
| Focused SD-2912 lifecycle smoke (load + edit + export + undo) | ✅ all steps pass |
Confidence: ≥ 99% that the PR is ready to merge. Customer's named ask resolved, no regressions across the standard editing surface, fix is idempotent and reaches every production pgMar emission path.
luccas-harbour
left a comment
There was a problem hiding this comment.
hey @tupizz!
codex found a few more issues here, they look relevant to me, can you check?
- [P2] Preserve explicit highlight clears after edits — packages/super-editor/src/editors/v1/core/super-converter/styles.js:572-579
When a run imports `<w:highlight w:val="none"/>`, `encodeMarksFromRPr` represents it as the same transparent highlight mark used here. After any edit that triggers `calculateInlineRunPropertiesPlugin`, `highlight` is treated as mark-derived and the existing `runProperties.highlight` is not preserved, so this branch drops the explicit `none` clear. In documents where a character/paragraph style supplies highlight, editing such a run will re-enable the inherited highlight on export.
- [P2] Keep bold/italic working for complex-script text — packages/super-editor/src/editors/v1/core/super-converter/styles.js:537-546
When a user newly applies the normal bold or italic mark to Arabic/Hebrew/other complex-script text, export now emits only `<w:b>`/`<w:i>`. ECMA-376 applies those to non-complex-script characters; complex-script glyphs require `<w:bCs>`/`<w:iCs>`, and existing preservation only helps when those properties were already in the source. This regresses newly formatted complex-script runs after DOCX export.
- [P2] Normalize decimal pgMar values even when numerically integral — packages/super-editor/src/editors/v1/core/super-converter/exporter.js:135-136
This only rewrites values when `Number.isInteger(num)` is false, so strings like `w:top="168.0"` or `w:left="352.000000"` pass through unchanged even though they are still decimal lexical values and fail the same strict OOXML integer-twips validation this helper is meant to fix. The check should canonicalize any finite numeric pgMar attribute whose serialized value is not already an integer token.
|
🎉 This PR is included in @superdoc-dev/mcp v0.3.0-next.95 The release is available on GitHub release |
|
🎉 This PR is included in @superdoc-dev/react v1.2.0-next.139 The release is available on GitHub release |
|
🎉 This PR is included in vscode-ext v2.3.0-next.141 |
|
🎉 This PR is included in superdoc-cli v0.8.0-next.110 The release is available on GitHub release |
|
🎉 This PR is included in superdoc-sdk v1.8.0-next.93 |
|
🎉 This PR is included in superdoc v1.30.0-next.90 The release is available on GitHub release |
Summary
Closes two distinct sources of round-trip drift on the SD-2912 customer fixture:
Default rPr injection —
decodeRPrFromMarkswas emitting<w:bCs/>,<w:iCs/>, and<w:highlight w:val="none"/>defaults that didn't exist in the source rPr. 2,466 redundant XML elements / ~37 KB of noise per export. Fix: stop auto-propagatingbold→boldCs/italic→italicCs, stop synthesizing a transparent highlight fromw:shdshading.Float-valued
<w:pgMar>attributes — ECMA-376 §17.6.11 requiresST_TwipsMeasureto be a non-negative whole number; the source carried floats likew:top="168.160400390625"from an upstream pipeline and SuperDoc preserved them verbatim through the paragraph-level sectPr passthrough (56 of 57 pgMar elements). Strict consumers rejected the result. Fix: a single export-timenormalizePgMarTwipsInTreepass that rounds every numeric pgMar attribute to an integer viaMath.round. Applied once intranslateDocumentNode; idempotent.Why these two together
Both surface as schema fidelity / round-trip problems on the same customer fixture (Athena Intelligence) and are documented under SD-2912. Splitting them across PRs would double review surface for one ticket; the two changes touch different files (
styles.js+ plugin for #1,exporter.jsfor #2) and don't interact.Test plan
styles.test.js(default-rPr decoding),exporter.pgmar-normalize.test.js(helper: 9 AAA scenarios covering undefined/null, no-pgMar tree, single pgMar, nested sectPrs, idempotency, integer-no-op, non-numeric ignored).sd-2912-pgmar-roundtrip.test.js: every exported pgMar attr is an integer; raw XML carries no decimal-valued pgMar attribute; bCs/iCs/highlight counts stay at 0.super-editorsuite — 12,726 passed, 13 skipped, 0 failed (+12 from main baseline).w:top="168.160400390625"exports asw:top="168"Verified counts on the customer fixture
document.xml<w:bCs><w:iCs><w:highlight><w:pgMar>countdocument.xmlsizeOut of scope (separate follow-ups documented on the ticket)
customXml/item2.xmlSharePoint FormTemplates: lost on round-trip because the XML parser doesn't preserve<?mso-contentType?>processing instructions. Needs a binary-passthrough path forcustomXml/*.word/styles.xmlHyperlink style injection: a default Hyperlink style is added even when the source didn't have it. Likely intentional; worth a separate scoping discussion.docProps/app.xml+ Content_Types override: SuperDoc generatesapp.xmllike Word does on save. Benign.w:gutter="0"injection onensureSectionLayoutDefaults: emission concern (not value normalization). Tiny follow-up if it causes any consumer issue.The two biggest sources of customer-visible drift on this fixture are addressed here.