Skip to content

Add compression-oriented function reordering pass#8696

Open
brendandahl wants to merge 1 commit into
WebAssembly:mainfrom
brendandahl:reorder
Open

Add compression-oriented function reordering pass#8696
brendandahl wants to merge 1 commit into
WebAssembly:mainfrom
brendandahl:reorder

Conversation

@brendandahl
Copy link
Copy Markdown
Collaborator

Implement the --reorder-functions-by-similarity optimization pass in wasm-opt.

Gzip and Brotli compression algorithms rely on finding repetitive byte patterns inside a sliding window (e.g., 32KB for Gzip). If structurally similar functions are placed far apart in the Wasm binary, the compressor cannot detect matches across them. While the existing --reorder-functions pass sorts functions strictly by call frequency to shrink LEB128 indexes, it scatters mutually compressible functions and ultimately increases gzipped delivery sizes.

This new pass traverses defined function bodies in post-order and extracts a similarity sorting key based on signature type IDs, local variables types, and structural opcode sequences. By sorting defined functions lexicographically by this key, structurally similar functions are physically grouped together in the output binary, providing adjacent compressible bytes.

Implement the --reorder-functions-by-similarity optimization pass
in wasm-opt.

Gzip and Brotli compression algorithms rely on finding repetitive byte
patterns inside a sliding window (e.g., 32KB for Gzip). If structurally
similar functions are placed far apart in the Wasm binary, the
compressor cannot detect matches across them. While the existing
--reorder-functions pass sorts functions strictly by call frequency to
shrink LEB128 indexes, it scatters mutually compressible functions and
ultimately increases gzipped delivery sizes.

This new pass traverses defined function bodies in post-order and
extracts a similarity sorting key based on signature type IDs, local
variables types, and structural opcode sequences. By sorting defined
functions lexicographically by this key, structurally similar
functions are physically grouped together in the output binary,
providing adjacent compressible bytes.

Empirical benchmarks on real-world Flutter and Poppler Wasm examples
show a significant improvement, saving up to 2.13% and .98% in compressed
delivery size compared to the baseline (no reordering).
@brendandahl brendandahl requested a review from a team as a code owner May 13, 2026 00:26
@brendandahl brendandahl requested review from tlively and removed request for a team May 13, 2026 00:26
@brendandahl
Copy link
Copy Markdown
Collaborator Author

Below is a comparison of the uncompressed and gzip-compressed binary sizes for both configurations. There are still some tweaks I think we can make. I've been able to get 2% on some files, but it wasn't doing as well on others (still need to figure out why).

Benchmark File Uncompressed Baseline (bytes) Uncompressed Similarity (bytes) Uncompressed Change Gzip Baseline (bytes) Gzip Similarity (bytes) Gzip Change (Savings)
dart-flute-complex.opt.wasm 1,081,549 1,083,288 +0.16% 392,180 386,221 -1.52%
dart-flute-complex.unopt.wasm 1,284,344 1,286,148 +0.14% 458,367 452,629 -1.25%
dart-pop.unopt.wasm 398,114 398,114 0.00% 148,474 146,737 -1.17%
dart-pop.opt.wasm 350,546 350,546 0.00% 133,329 131,929 -1.05%
v8_poppler.wasm 2,067,741 2,076,431 +0.42% 987,474 982,825 -0.47%
v8_sqlite.c.wasm 931,440 936,924 +0.59% 378,918 375,992 -0.77%
v8_box2d.wasm 86,598 86,598 0.00% 39,983 39,978 -0.01%

Copy link
Copy Markdown
Member

@tlively tlively left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly comments on algorithmic improvements. Let me know if you'd rather land as-is to get the measured benefit without investing more time in algorithmic improvements and I can review with that in mind.

Comment on lines +48 to +49
// Capture important immediate type/operator information
// TODO: There's probably more data that would be useful to capture.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could probably extract and reuse the HashStringifyWalker from Outlining.cpp. It turns expression trees into strings by shallowly hashing each expression, including all of its immediates. You would just want it to use a normal PostWalker (but probably modified to also call addUniqueSymbol at control flow boundaries, e.g. end and else) instead of the custom StringifyWalker it currently uses. Nothing a little extra templating can't solve!

Comment on lines +75 to +76
// does not help and can regress size due to breaking natural call
// proximity.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not call proximity, but LEB size, right?

Comment on lines +126 to +127
size_t numThreads = ThreadPool::get()->size();
std::vector<std::function<ThreadWorkState()>> doWorkers;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we have any other passes that use ThreadPool directly. This is typically done by ParallelFunctionAnalysis or with a nested Pass for which isFunctionParallel() returns true.

ThreadPool::get()->work(doWorkers);

// 3. Sort defined functions by the similarity heuristic
std::sort(keys.begin(), keys.end());
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorting only works when the similarities are at the beginning of the strings, right? It seems like looking for matching substrings would be more robust. You could check out what Outlining.cpp does with a suffix tree to find common substrings, for example.

@kripken
Copy link
Copy Markdown
Member

kripken commented May 13, 2026

I assume the background here is #4322 ? Some prior work is there.

@brendandahl
Copy link
Copy Markdown
Collaborator Author

No, though I did find that after starting this. Awhile ago I was playing with compressed wat vs wasm with brotli/gzip and added a note to try reordering for gzip. I haven't tried out the idea from cromulate. I was also going to ask if you still have your similarity-ordering branch somewhere?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants