Skip to content

Compile and emit initialization functions, rather than interpreting initialization datastructures #2639

@fitzgen

Description

@fitzgen

Right now, a 1000ft overview of our instantiation process (ignoring creating import-able functions in the linker, which shared host functions adderesses) looks something like this:

  1. look up imports in linker's hash table and flatten them to an array
  2. allocate space for memories/tables/globals/etc
  3. fill the vmctx with pointers to the imports, etc
  4. initialize globals by interpreting global intializers from the wasm module
  5. initialize tables by interpreting element initializer segments from the wasm module
  6. initialize memory by interpreting data initializer segments from the wasm module

The instance allocator pool work makes (2) super fast! ✔️

Although they should usually take relatively little time, we can make (4) through (6) even faster by using cranelift to compile an initialization function that doesn't have an interpreter loop iterating over each initializer and checking that it is in bounds and all that, but instead emit code with that interpreter loop unrolled and a bunch of bounds checking that was per-iteration of the interpreter loop de-duplicated into a single check for everything. Then we just call this JIT code during instantiation, rather than initializing these things ourselves!

Of course the amount of speed up we'll get by doing this is going to be a function of how many global/table element/data segment initializers a module has. Usually it isn't too many. But some modules, particularly those generated by Wizer, might have a good amount of them, and this could potentially save us a few microseconds on instantiation (great to be at the level where we are counting microseconds here 😃). Also, funcref tables can get pretty big and generally every index is initialized with an element.

We could potentially also JIT code for (3) but this seems slightly more complicated because it is more heterogeneous and also the vmctx fields/layout change more frequently than globals/tables/memory so it may have a larger maintenance burden.

Finally, we've talked about using virtual memory tricks to make page-aligned and -sized data segments

  • lazily initialized (via userfaultfd) and
  • copy-on-write (via mapping them with MAP_PRIVATE)

This JIT-initialization approach should technically be complimentary to these things. Even if (6) effectively goes away from our instantiation times by becoming lazy, (4) and (5) will still need initializing at instantiation time, as will any non-page-aligned and -sized data segments. But it might make the potential speed ups that much smaller, and bump this optimization pretty far down the priority list. Something to consider.

Aside: it is worth thinking about speeding up (1) as well. If we are repeatedly instantiating the same module with the same imports (eg an instantiation of the same module with the same imports for each http request that a server receives) then it seems like we could do (1) just the one time and then reuse the flattened imports array for every instantiation. Not totally sure what this would look like at the API level. I think it might be possible to implement without wasmtime API changes, but maybe we don't want to force everyone to implement this same optimization by hand? Another thing to mull over.

cc @tschneidereit @lukewagner @alexcrichton since we talked about this yesterday

cc @peterhuene because this is related to instantiation performance

Metadata

Metadata

Assignees

No one assigned

    Labels

    performancewasmtimeIssues about wasmtime that don't fall into another label

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions