Skip to content

feat(document-api): customXml.parts.* — list/get/create/patch/remove (SD-3105)#3245

Merged
caio-pizzol merged 14 commits into
mainfrom
caio-pizzol/SD-3105-custom-xml-parts-api
May 14, 2026
Merged

feat(document-api): customXml.parts.* — list/get/create/patch/remove (SD-3105)#3245
caio-pizzol merged 14 commits into
mainfrom
caio-pizzol/SD-3105-custom-xml-parts-api

Conversation

@caio-pizzol
Copy link
Copy Markdown
Contributor

Adds a public API for the OOXML Custom XML Data Storage Part (ECMA-376 §15.2.5, §15.2.6, §22.5). Customers who today reach into the converter to anchor structured payloads to a document (citation metadata, license data, AI review records) now have an official path: editor.doc.customXml.parts.list/get/create/patch/remove.

  • API shape is the OOXML-aligned naming we landed on after extensive design review: customXml.parts.*, object inputs, target: { id } | { partName }, list returns summaries (no content), get returns the full record, partName fallback for foreign parts that ship a Storage Part without a matching Properties Part.
  • Implementation coordinates the five OOXML package files on every write (storage part, props part, item rels, document rel, content-types inherited) and tombstones removed paths via converter.removedCustomXmlPaths so the exporter nulls original-zip entries when the customer removes a previously-imported part.
  • Lifecycle goes through executeOutOfBandMutation, the same primitive citation sources use — expectedRevision, dryRun, dirty marking, GUID promotion, revision increment all wired. REVISION_MISMATCH and other lifecycle errors propagate; only content-parsing errors are caught and mapped to INVALID_INPUT.
  • partName targets are restricted to actual storage parts (customXml/itemN.xml) so the low-level API can't read or mutate unrelated package files like word/document.xml.
  • Storage-to-props pairing uses customXml/_rels/itemN.xml.rels per spec, with the index-name heuristic only as a fallback for foreign parts without a rels file.

Must stay the same: ECMA-376 alignment on names and shapes — customXml.parts (not metadata), id = itemID GUID, partName = package path, rootNamespace vs schemaRefs kept distinct.

Review: check that the executeOutOfBandMutation wiring matches the citation-sources adapter pattern, and that the tombstone strategy doesn't leak to non-customXml paths. Ignore the displacedByCustomXml.js typecheck warnings — pre-existing on main.

Verified: pnpm --filter @superdoc/super-editor exec vitest run src/editors/v1/document-api-adapters → 3401/3401 pass; pnpm --filter @superdoc/document-api test → 1428/1428 pass; round-trip integration test exports a created part, reimports through the canonical loader, and asserts itemID/content/schemaRefs survive.

Locks in the public API surface for Custom XML Data Storage Parts:
- Types + validators (customXml/customXml.types.ts, customXml.ts)
- 5 operation definitions in operation-definitions.ts
- Registry entries in operation-registry.ts
- Dispatch entries in invoke.ts
- DocumentApi.customXml + adapter slot in index.ts
- Re-exports in package barrel

Adapter implementation, plan-engine wrapper, OOXML package writer,
and tests are pending. Two known typecheck failures left:
reference-doc-map.ts (needs customXml group entry) and schemas.ts
(needs JSON schemas for 5 ops).
…105)

Completes the read path through the Document API and closes the
remaining contract-layer wiring gaps.

Contract:
- reference-doc-map.ts: customXml group title/description/page entry
- schemas.ts: 5 op JSON schemas + customXmlPartTargetSchema helper
- 30 validator tests passing (target xor id/partName, content
  well-formedness smell-test, schemaRefs string[] check, patch
  requires at-least-one)

Read adapter:
- super-converter/custom-xml-parts.js: discovery, parsing, serialization
  helpers (listCustomXmlStoragePartNames, parsePropsPart, readCustomXmlPart)
- plan-engine/custom-xml-wrappers.ts: list/get adapter routing through
  buildDiscoveryItem/Result, filters by rootNamespace and schemaRef,
  partName-targeting fallback for foreign parts without Properties Parts
- assemble-adapters.ts: customXml plugged in alongside bookmarks
- 10 integration tests passing (list empty, list with filter, get by
  id, get by partName fallback, get unknown id returns null)

Write adapter:
- create/patch/remove return CAPABILITY_UNAVAILABLE pending Phase B
  (OOXML package file coordination)
…s (SD-3105)

create / patch / remove now implement the full OOXML package
coordination instead of returning CAPABILITY_UNAVAILABLE.

Write adapter (super-converter/custom-xml-parts.js):
- createCustomXmlPart: generates fresh GUID itemID, allocates next
  free index, writes Storage Part + Properties Part + item rels +
  document-level relationship (5-file coordination)
- patchCustomXmlPart: resolves by id or partName, replaces content
  and/or schemaRefs, preserves itemID. Creates a Properties Part
  on the fly when patching schemaRefs into a foreign part that
  doesn't have one yet.
- removeCustomXmlPart: deletes storage, props, item rels, and the
  document-level relationship pointing at the part.

Adapter wrappers (plan-engine/custom-xml-wrappers.ts):
- All three writers call rejectTrackedMode (matches the contract
  declaration of supportsTrackedMode: false).
- Errors map cleanly to INVALID_INPUT / TARGET_NOT_FOUND.
- supportsDryRun set to false for v1; dry-run support can come later.

Conformance:
- contract-conformance.test.ts: throw/apply vectors registered for
  all three customXml.parts mutating ops.
- contract.test.ts: customXml added to the validGroups list.

Coverage:
- 16 integration tests passing (read + write + round-trip).
- 1195/1195 conformance tests passing.
- 3392/3392 across the full document-api-adapters suite.
- 1428/1428 across the document-api package suite.

Round-trip test exports a created part to DOCX, reimports through
the canonical loader, and verifies the itemID GUID, rootNamespace,
schemaRefs, and content all survive. The 5-file package
coordination is empirically OOXML-faithful.
…tombstones (SD-3105)

All six findings from the SD-3105 review:

#1 (High) partName scoping
- resolveTargetPartName and readCustomXmlPart now require the path to
  match customXml/itemN.xml. Targets like word/document.xml are rejected
  cleanly instead of letting through.

#2 (High) bypassed mutation lifecycle
- create/patch/remove now route through executeOutOfBandMutation, the
  same shared primitive citation sources use. Each call gets:
    * expectedRevision check
    * dryRun preview path
    * dirty marking + GUID promotion
    * revision increment on actual change
- supportsDryRun: true on all three ops with real dry-run validation
  (well-formedness for create/patch, target resolution for patch/remove).

#3 (High) deletion didn't persist for imported DOCX parts
- removeCustomXmlPart now stamps the removed paths into a
  converter.removedCustomXmlPaths set. Editor.ts export loop emits
  updatedDocs[path] = null for each entry, so original-zip entries are
  tombstoned instead of being copied through.

#4 (Medium) props parts paired by filename
- findPropsPartFor now reads customXml/_rels/itemN.xml.rels and follows
  the Type=customXmlProps relationship. Falls back to the index-name
  heuristic only when no rels file exists. Foreign docs with non-
  matching names now resolve correctly.

#5 (Medium) contract metadata lied about failures
- possibleFailureCodes updated to actual codes: ['INVALID_INPUT'],
  ['TARGET_NOT_FOUND', 'INVALID_INPUT'], ['TARGET_NOT_FOUND'].

#6 (Medium) JSON schemas didn't match runtime
- get output now { oneOf: [{ type: 'object' }, { type: 'null' }] }.
- patch input encodes 'at least one of content or schemaRefs' via
  anyOf, with additionalProperties: false.
- content fields gain minLength: 1.

Coverage update:
- Two new integration tests assert #1 (partName rejection on
  word/document.xml etc) and #4 (foreign-name props resolved via rels).
- failureCase and dryRun vectors added for all three customXml.parts
  mutating ops in the conformance suite.

Verified:
- @superdoc/super-editor: 3398/3398 across 123 files
- @superdoc/document-api: 1428/1428 across 51 files
… (SD-3105)

Two correctness fixes from the second review pass:

#1 Broad catch was swallowing REVISION_MISMATCH
  customXml.parts.create and patch wrapped the entire
  executeOutOfBandMutation call in a try/catch that converted
  everything to INVALID_INPUT. Lifecycle errors from
  checkRevision (REVISION_MISMATCH) and any future PlanError
  propagation were being eaten.

  Replaced the outer try/catch with a scoped safeValidate helper
  that only catches content-parsing errors from createCustomXmlPart
  / patchCustomXmlPart, returning them as structured INVALID_INPUT
  outcomes. The executeOutOfBandMutation call itself now lets
  revision and other lifecycle errors bubble.

  Also reordered patch: target resolution runs FIRST, so a missing
  target reports TARGET_NOT_FOUND instead of (potentially)
  INVALID_INPUT if patch happened to throw.

#2 Tombstone could null a newly-created part on the same index
  remove → create on a recycled index (customXml/item1.xml) had
  this sequence: remove writes 'customXml/item1.xml' to the
  tombstone set; create reuses index 1 because convertedXml has
  no item1.xml; export serializes the new part, then the tombstone
  loop runs and overwrites updatedDocs['customXml/item1.xml'] with
  null, deleting the freshly-created part from the exported zip.

  Fix: createCustomXmlPart now removes its written paths
  (partName, propsPartName, itemRelsPath) from
  converter.removedCustomXmlPaths whenever a converter is passed.
  The new integration test exercises the exact remove → create →
  export → reimport sequence and asserts the new part survives
  with its fresh id and content.

Coverage:
- 19/19 integration tests passing (incl. the new tombstone test).
- 3401/3401 super-editor document-api-adapters tests.
- 1428/1428 document-api package tests.
@caio-pizzol caio-pizzol requested a review from a team as a code owner May 12, 2026 12:43
@linear
Copy link
Copy Markdown

linear Bot commented May 12, 2026

SD-3105

@github-actions
Copy link
Copy Markdown
Contributor

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: fb03693e92

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

…ntent-types pruning (SD-3105)

Three review findings — each verified by a failing test first, then fixed:

#1 findPropsPartFor used an ad-hoc regex that only handled bare names
   and `/` prefixes. Valid OPC Target forms like `./itemPropsN.xml`
   and `../customXml/itemPropsN.xml` (per RFC 3986 §5.2.4 and ECMA-376
   §9.1.4) silently fell through to the index-name fallback or missed
   the props entirely. Route through resolveOpcTargetPath with
   baseDir='customXml' (the source part's directory). Two new tests
   assert resolution for `./` and `../customXml/` Target forms.

#2 removeCustomXmlPart on the bibliography part left
   converter.bibliographyPart populated. On the next export,
   syncBibliographyPartToPackage(convertedXml, bibliographyPart) wrote
   the cached sources back into convertedXml — resurrecting the
   supposedly-removed part in the in-memory state (the tombstone still
   nulled the exported zip entry, but the editor's own view of the
   document silently re-grew the part). patchCustomXmlPart on the
   bibliography part had the worse variant: cached sources overwrote
   the customer's new content. Both now call
   invalidateConverterCachesForPath, which clears
   converter.bibliographyPart when its partPath matches the part we
   touched. New test exercises remove + exportDocx and asserts the
   convertedXml entry stays gone.

#3 DocxZipper.updateContentTypes only pruned stale Override entries
   for comment parts. After customXml.parts.remove, the original DOCX's
   `<Override PartName="/customXml/itemPropsN.xml" .../>` survived
   in the exported [Content_Types].xml, pointing at a non-existent
   part. The operation's cleanup contract claimed otherwise. Extended
   the stale-override pruning to also cover customXml/itemPropsN.xml
   paths absent from the final zip (i.e. tombstoned via
   updatedDocs[path] = null).

Also clears the now-stale top-of-file docblock on the integration test
that claimed write-side was `CAPABILITY_UNAVAILABLE`-stubbed; the
file actually contains a full write-side suite including round-trip
and bibliography-cache tests.

Verified:
- 22/22 customXml integration tests
- 3404/3404 super-editor document-api-adapters
- 1428/1428 document-api package
@caio-pizzol caio-pizzol self-assigned this May 13, 2026
…ength (SD-3105)

Three findings from another review round; verified two with failing
tests, then fixed:

#1 (real bug, verified) — Content_Types Override pruning regex was too
   tight: `/^\/customXml\/itemProps\d+\.xml$/i` only matched
   numeric-named props parts. But `findPropsPartFor` correctly resolves
   foreign-named props parts (e.g. `customXml/itemPropsFOREIGN.xml`)
   via the OPC rels file, so removing one would tombstone the file but
   leave a stale Override pointing at it. Fix: identify props Overrides
   by ContentType (`application/vnd.openxmlformats-officedocument.customXmlProperties+xml`)
   instead of filename. New DocxZipper test confirms the foreign-named
   Override is pruned on tombstone.

#2 (contract gap, verified) — `customXmlPartTargetSchema` allowed
   empty `id` and `partName` strings even though the runtime
   validator requires non-empty. Added `minLength: 1` to both
   target-shape branches. Also added `minLength: 1` to `content`
   and `schemaRefs.items` on create/patch.

   Pulling minLength into the contract surfaced a secondary issue: the
   conformance test's custom JSON Schema validator didn't support
   `minLength` / `maxLength`. Added support (lines 84-91 had been
   rejecting unsupported keywords entirely, which made my oneOf
   branches both fail). The validator now matches the keywords its
   schemas actually use.

#3 (scope question, not changed) — Discovery of foreign-named Storage
   Parts (filenames other than `customXml/itemN.xml`). Considered:
   walking word/_rels/document.xml.rels for `customXml`-type rels
   would cover this. But `isCustomXmlStoragePartName` and
   `findPropsPartFor` both key off the numeric-index convention, so
   broadening discovery without also broadening those would leave list
   and get/patch/remove disagreeing about what's a valid part.
   Documented as an explicit v1 scope limitation via AIDEV-NOTE on
   `listCustomXmlStoragePartNames`. No real-world producer (Word,
   Google Docs, LibreOffice, pandoc) deviates from the convention, so
   v1 ships Word-style only.

Verified:
- 3429/3429 super-editor document-api-adapters + DocxZipper
- 1428/1428 document-api package
- New DocxZipper test exercises the foreign-named props Override
  pruning end-to-end
…SD-3105)

ECMA-376 §22.5.2.3 distinguishes three schemaRefs states:
  - <schemaRefs> omitted         → app may infer schemas
  - <schemaRefs/> present empty  → explicit "no schemas should be used"
  - <schemaRefs> with children   → these schemas validate the part

The previous write side conflated the first two: it always emitted
<schemaRefs/> regardless of whether the caller passed undefined or [].
This made downstream consumers see "no schemas" when the customer
intent was actually "app picks".

Fix:
- buildItemPropsRoot now omits the <schemaRefs> element when schemaRefs
  is undefined, and emits it (empty or with children) when an array is
  passed explicitly.
- createCustomXmlPart no longer coerces undefined to [] before calling
  buildItemPropsRoot.
- patchCustomXmlPart already passed through verbatim — no change needed.

Read side keeps returning schemaRefs: [] for both the omitted and the
present-empty cases. The distinction is lost in the public summary
type (schemaRefs: string[]), and recovering it would require a type
contract change — deferred for v1.

Two new integration tests assert:
  1. create({ content }) without schemaRefs produces a Properties Part
     with NO <ds:schemaRefs> element.
  2. create({ content, schemaRefs: [] }) produces a Properties Part
     WITH an empty <ds:schemaRefs/> element.

Verified:
- 3431/3431 super-editor document-api-adapters
- 1428/1428 document-api package
Two reviewer follow-ups on commit b47d3c9:

1. AIDEV note framing softened. "Every real producer" was a stronger
   claim than I could verify. Reframed to "the Word-compatible
   producers we target use this convention" — accurate and doesn't
   pretend to have audited every OOXML producer.

2. v1 scope now surfaced in the public type contract, not just internal
   notes. Generated docs and consumer JSDoc tooltips now show the
   Word-style filename constraint:
     - CustomXmlPartTarget.partName: scope note added
     - CustomXmlPartSummary.partName: scope note added
   Foreign-named Properties Parts still work (paired via rels); only
   Storage Part filenames are constrained.

Not in this commit:
- The reviewer flagged the conformance schema validator's early-return
  after oneOf/anyOf — keywords like required and additionalProperties
  sitting at the same level as anyOf are not checked. Confirmed via
  source inspection. Affects the patch input schema specifically. Not
  a production API bug (runtime validators in customXml.ts cover these
  constraints); just a slight weakening of conformance signal. Worth a
  separate ticket on the test harness, not this PR.
…w schemaRefs semantics (SD-3105)

The JSDoc on CustomXmlPartsCreateInput.schemaRefs and
CustomXmlPartsPatchInput.schemaRefs was stale after commit ced0fe3
swapped the writer to preserve the ECMA-376 §22.5.2.3 omitted-vs-empty
distinction.

Old text claimed 'when omitted or empty, [an] empty <ds:schemaRefs/>
[is] still emitted'. That was true before ced0fe3; now omitted
produces no element. Updated both JSDoc blocks to explain the three
distinct spec states (omitted, empty, populated) so generated docs and
IDE tooltips match runtime behavior.
…s (SD-3105)

operation-definitions.ts descriptions feed reference docs (Mintlify),
LLM tool catalog descriptions, and CLI help text. The public type
JSDoc on CustomXmlPartTarget already states the v1 partName scope,
but consumers reading generated docs or tool descriptions wouldn't
see it. Added the constraint to the three operation descriptions that
accept a partName target: get, patch, remove. Not on list (returns
whatever's discovered) or create (always emits Word-style filenames).
…Paths, test name fix (SD-3105)

Three findings from another review round:

#1 patch on a foreign Storage Part minted a fresh itemID but didn't
   surface it
   Scenario: customer targets by partName because the imported part
   had no Properties Part; patches schemaRefs; patchCustomXmlPart
   creates a new Properties Part with a fresh GUID; wrapper returned
   { success, target: input.target } leaving the caller unable to
   address the part by id without re-listing.
   Fix:
   - patchCustomXmlPart now returns { partName, id? } where id is the
     resolved (existing or freshly minted) itemID.
   - CustomXmlPartsMutationSuccess gains an optional id?: CustomXmlPartId
     field with JSDoc explaining the patch-foreign-part case.
   - Wrapper passes id through to the success result.
   - Schema customXmlPartMutation gains an optional id field
     (minLength: 1).
   - New integration test: import a Storage Part with no props,
     patch schemaRefs, assert the result includes a new GUID and
     get({ id }) finds the part.

#2 removedCustomXmlPaths accessed via as unknown as casts
   Two readers (Editor.ts, custom-xml-wrappers.ts) and one writer
   (custom-xml-parts.js) all coupled via casts. A rename would break
   tombstone emission silently.
   Fix: added removedCustomXmlPaths?: Set<string> to
   SuperConverter.d.ts with JSDoc. Dropped the cast in Editor.ts. The
   local type alias in custom-xml-wrappers.ts is still convenient as
   structural typing (it duck-types the converter without importing
   the full class), so leaving it.

#3 Test name at customXml.test.ts:206 said 'rejects' but body asserted
   .not.toThrow(). Renamed to 'accepts patch with empty schemaRefs
   alongside valid content'.

Out-of-scope items the reviewer flagged but already on branch:
- DocxZipper Content_Types Override pruning for tombstoned customXml
  props (fixed in b47d3c9)
- Schema minLength on target id/partName (fixed in b47d3c9)
- ./-prefix in item-rels resolver (fixed in 7cb928e via
  resolveOpcTargetPath)

Word-fixture observation re: ds:schemaRefs auto-fill from root
namespace is real but our v1 stance is deliberate (omit/[]/populated
distinct per ECMA-376 §22.5.2.3, see ced0fe3).

Verified:
- 3432/3432 super-editor document-api-adapters + DocxZipper
- 1428/1428 document-api package
… (SD-3105)

Reviewer caught a regression: adding removedCustomXmlPaths?: Set<string>
as an explicit field on SuperConverter.d.ts in ee06aa0 triggered
TypeScript weak-type errors at three call sites that pass
SuperConverter into local structural types not including the new field:

  - Editor.ts:1103 → ConverterWithDocumentSettings
  - HeaderFooterSessionManager.ts:703, 2322 → ConverterLike
  - PresentationEditor.ts:6039 → ConverterWithDocumentSettings

Verified by running types:check with and without the d.ts change —
errors only appear with the typed field present, because the
[key: string]: any index signature alone is enough to satisfy weak
types, but an explicit named field forces TypeScript to require at
least one property overlap with the target shape.

Per reviewer's smaller-fix suggestion: revert the d.ts change, restore
the cast in Editor.ts with an AIDEV-NOTE pointing at this regression
so future maintainers don't try the same simplification.

Properly cleaning up the converter declaration is a separate piece of
work (would need to enumerate the actual fields ConverterWithDocumentSettings /
ConverterLike consume from real producers). Not in scope here.

Verified:
- 3432/3432 super-editor document-api-adapters + DocxZipper
- SuperConverter weak-type errors no longer in types:check output
@codecov-commenter
Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@caio-pizzol caio-pizzol merged commit 87368de into main May 14, 2026
72 checks passed
@caio-pizzol caio-pizzol deleted the caio-pizzol/SD-3105-custom-xml-parts-api branch May 14, 2026 17:27
@superdoc-bot
Copy link
Copy Markdown
Contributor

superdoc-bot Bot commented May 14, 2026

🎉 This PR is included in @superdoc-dev/mcp v0.3.0-next.97

The release is available on GitHub release

@superdoc-bot
Copy link
Copy Markdown
Contributor

superdoc-bot Bot commented May 14, 2026

🎉 This PR is included in vscode-ext v2.3.0-next.143

@superdoc-bot
Copy link
Copy Markdown
Contributor

superdoc-bot Bot commented May 14, 2026

🎉 This PR is included in @superdoc-dev/react v1.2.0-next.141

The release is available on GitHub release

@superdoc-bot
Copy link
Copy Markdown
Contributor

superdoc-bot Bot commented May 14, 2026

🎉 This PR is included in superdoc-cli v0.8.0-next.112

The release is available on GitHub release

@superdoc-bot
Copy link
Copy Markdown
Contributor

superdoc-bot Bot commented May 14, 2026

🎉 This PR is included in superdoc-sdk v1.8.0-next.95

@superdoc-bot
Copy link
Copy Markdown
Contributor

superdoc-bot Bot commented May 14, 2026

🎉 This PR is included in superdoc v1.30.0-next.92

The release is available on GitHub release

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants