perf: Avoid multi-second getDestinations stalls for PDFs with many named destinations#21213
perf: Avoid multi-second getDestinations stalls for PDFs with many named destinations#21213saripovdenis wants to merge 1 commit intomozilla:masterfrom
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #21213 +/- ##
=======================================
Coverage 55.95% 55.96%
=======================================
Files 218 218
Lines 59039 59033 -6
=======================================
+ Hits 33034 33035 +1
+ Misses 26005 25998 -7
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
@calixteman thank you for review! pushed a changed |
|
/botio browsertest |
From: Bot.io (Linux m4)ReceivedCommand cmd_browsertest from @calixteman received. Current queue size: 0 Live output at: http://54.241.84.105:8877/5d377e52d089d5e/output.txt |
From: Bot.io (Windows)ReceivedCommand cmd_browsertest from @calixteman received. Current queue size: 0 Live output at: http://54.193.163.58:8877/09cc5ae668b195a/output.txt |
From: Bot.io (Linux m4)FailedFull output at http://54.241.84.105:8877/5d377e52d089d5e/output.txt Total script time: 18.14 mins
Image differences available at: http://54.241.84.105:8877/5d377e52d089d5e/reftest-analyzer.html#web=eq.log |
From: Bot.io (Windows)FailedFull output at http://54.193.163.58:8877/09cc5ae668b195a/output.txt Total script time: 24.40 mins
Image differences available at: http://54.193.163.58:8877/09cc5ae668b195a/reftest-analyzer.html#web=eq.log |
|
The commits must be squashed, note https://github.com/mozilla/pdf.js/wiki/Squashing-Commits, and please write a proper commit message (since a single line is never really sufficient). |
Using Array.prototype.shift() to drain the traversal queue makes each visited node move the remaining queued entries. For large name/number trees this can make getAll() spend quadratic time in queue management. Iterate over the queue with for...of instead. Children pushed while iterating are still visited, and the queue no longer needs repeated front removals.
511a900 to
473f9b4
Compare
@Snuffleupagus Thank you! Squashed & added a proper commit message anything else I can improve in this PR? |
Replace the
queue.shift()traversal inNameOrNumberTree.getAll()with an index-based queue loop. This keeps traversal order and duplicate-reference handling, but avoids copying the remaining queue for every node.This matters for callers that materialize full name/number trees:
getDestinations(), embedded-file name trees, JavaScript name trees, XFA image name trees, and tagged-PDF parent/id trees.End-user impact:
For small PDFs, no visible change. With 100 named destinations the run stays about 0.29 ms; with 1,000 destinations the result is noise-level because parsing/worker overhead dominates. With large destination trees, the UI/API avoids a quadratic traversal pause. In the 50,000-destination case,
getDestinations()went from 5570.5 ms to 715.5 ms, saving about 4.9 s.Benchmark:
I generated synthetic one-page PDFs with
/Names << /Dests ... >>trees where the root has one leaf per named destination, then measuredpdfDocument.getDestinations()in Node using thegeneric-legacybuild.Isolated
NameTree.getAll()micro-benchmark on the same flat tree shape: