Skip to content

feat: expose PDF page indirect object number via EPDF_GetPageObjNum#577

Open
mpogodin-readdle wants to merge 2 commits intoembedpdf:mainfrom
mpogodin-readdle:feat/page-object-number
Open

feat: expose PDF page indirect object number via EPDF_GetPageObjNum#577
mpogodin-readdle wants to merge 2 commits intoembedpdf:mainfrom
mpogodin-readdle:feat/page-object-number

Conversation

@mpogodin-readdle
Copy link
Copy Markdown

Summary

PDF pages have a stable indirect object number in the PDF file structure (/Type /Page dictionary GetObjNum()). This number is stable across page reordering, insertion, and deletion — unlike the page index, which shifts whenever pages are moved. This PR exposes that number through the @embedpdf stack so consumers can use it as a permanent, stable page reference.

Use case: features like annotations and AI citations that must survive page reordering. The page index changes when pages are moved; the PDF object number does not.


Changes

@embedpdf/pdfium

  • Added EPDF_GetPageObjNum(document, page_index) → int C++ wrapper in build/code/cpp/main.cpp
    • Uses CPDFDocumentFromFPDFDocumentGetPageDictionaryGetObjNum() from internal PDFium APIs
    • Returns -1 if the document or page dictionary is invalid
  • Added EPDF_GetPageObjNum prototype to build/code/cpp/ext_api.h (picked up by the AST export generator)
  • Added -I.../pdfium-src to compile.esm.sh and compile.sh to allow internal PDFium header access from main.cpp
  • Registered EPDF_GetPageObjNum binding in src/vendor/functions.ts, pdfium.js, and pdfium.cjs

@embedpdf/models

  • Added objectNumber?: number to PdfPageObject with JSDoc explaining stability semantics

@embedpdf/engines

  • In openDocumentBuffer, calls EPDF_GetPageObjNum after rotation and spreads objectNumber onto each PdfPageObject
  • The call is runtime-guarded with typeof epdfGetPageObjNum === 'function' — existing builds without the new WASM export continue working unaffected (objectNumber is simply undefined)

WASM rebuild required

⚠️ I don't have access to a Docker/Emscripten environment, so I was not able to provide a rebuilt pdfium.wasm binary as part of this PR.

All the C++ source (main.cpp, ext_api.h) and build script changes (compile.esm.sh, compile.sh) are in place. Please run make build as part of merging to compile EPDF_GetPageObjNum into the binary and regenerate the vendor files. The JS/TS bindings are pre-wired and will activate automatically once the WASM is rebuilt.


Backward Compatibility

No breaking changes. The objectNumber field is optional on PdfPageObject. Older WASM builds silently skip the call and leave the field absent. Consumers must check for undefined before using it.


Testing

  • @embedpdf/models and @embedpdf/pdfium build cleanly
  • TypeScript type-check on @embedpdf/engines produces no new errors
  • Runtime guard verified to match the existing pattern used by EPDF_GetPageRotationByIndex

@vercel
Copy link
Copy Markdown

vercel bot commented Apr 3, 2026

@mpogodin-readdle is attempting to deploy a commit to the OpenBook Team on Vercel.

A member of the Team first needs to authorize it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant