Skip to content

Re-saving a .wasm file with wasm-opt -g results in a code size regression #8413

@juj

Description

@juj

We have a feature where we use wasm-opt to forcibly remove some functions in the .wasm file, to optimize down code size.

In our unit tests, we have tests that remove one or more functions, and assert that the size of those files is reduced.

Now after updating from Emscripten 3.1.38 to 4.0.19, we find these assertions failing. Investigating, it looks like the mere act of re-saving a .wasm file with wasm-opt results in the size of the .wasm file growing.

This is something that didn't happen with Emscripten 3.1.38.

  1. we have a emcc -Os --profiling-funcs -o foo.wasm compiled file -> foo.wasm.zip
  2. then we run wasm-opt foo.wasm --all-features -g -o out.wasm on it. -> out.wasm.zip
  3. we observe that out.wasm has grown in size, compared to original foo.wasm. A diff can be seen here -> https://oummg.com/dump/diff_foo_out.html (be patient opening the url, this is a very large .html file on a slow network connection)

In our actual usage, in step 2, we also add --strip-function arg, but for purposes of reproing this behavior, this can be skipped.

foo.wasm: 41,439,952 bytes
out.wasm: 41,566,424 bytes

out.wasm has grown by +126,472 bytes.

This growth on re-serializing the file is a little bit tricky, since it makes it more difficult to unit test the behavior of stripping away functions.

At first we thought that we would just re-serialize the file once before setting the baseline, and only then run wasm-opt a second time to strip the functions we are interested in, but here I notice that wasm-opt in.wasm --all-features -g -o out.wasm is not stable under iteration. E.g. when I iterate re-saving the file, the size slowly creeps for a bit:

Image

though it does seem to stabilize after a while.

Well, that is only a bit of a sidenote.

Coming back to the initial foo.wasm -> out.wasm code size increase, the diff page shows that the majority of the increase are located in two functions:

ParticleSystem::UpdateModulesPreSimulationIncremental(ParticleSystemUpdateData const&, ParticleSystemParticles&, unsigned long, unsigned long, float vector[4] const&, bool)

which grows by +16,535 bytes, and a

ParticleSystemGeometryJob::RenderJobCommon(ParticleSystemGeometryJob&, ParticleSystemParticlesTempData&, void*, void*)

which grows by +3,328b

Other functions grow by smaller amounts:

Image

When I disassemble the before/after .wasm files, curiously the size of ParticleSystem::UpdateModulesPreSimulationIncremental in WAST is about 400 lines shorter in # of text lines after the re-serialization.

I.e. a -400 lines shorter in WAST, but +16,535 bytes larger in binary.

One thing I can see is that there are named locals $scratch_xxx that appear in the re-serialized file, that are not present in the original:

Image

I suppose using -g makes wasm-opt generate those.

However, there are only 172 of scratch_* locals, and the string "scratch_" is only 8 characters, so that would account for only +1376 byte increase in added debug names? That is still missing a lot from the +16,535 bytes.

Apart from that, the disassembled functions are quite similar. At many places, they diff different to only the indexing of local variables. In some places the organization of the implementations differ. Though, the re-serialized WAST is 400 lines shorter still.

Here are the baseline and re-saved .wast dumps of the single large increased file:

I wonder if there might be any clues here as to why re-saving a .wasm file could grow in size like this? This is behavior that we did not experience with Emscripten 3.1.38, which was a very convenient property for setting up our unit tests for code stripping.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions