in ,

Multi-Value All The Wasm! – Mozilla Hacks – the Web developer blog, Hacker News

Multi-Value All The Wasm! – Mozilla Hacks – the Web developer blog, Hacker News


This article iscross-posted on the Bytecode Alliance web site.

(Multi-value) is a proposed extension to core WebAssembly that enables functions to return many values, among other things. It is also a pre-requisite forWasm interface types.

I’ve been adding multi-value supportallover the place recently:

  • I added multi-value support to all the various crates in the Rust and WebAssembly toolchain, so that Rust projects can compile down to Wasm code that uses multi-value features.
  • I added multi-value support to Wasmtime, the WebAssembly runtime built on top of the Cranelift code generator, so that it can run Wasm code that uses multi-value features.

Now, as my multi-value efforts are wrapping up, it seems like a good time to reflect on the experience and write up everything that’s been required to get all this support in all these places.

Wait – What is Multi- Value Wasm?

In core WebAssembly, there are a couple of arity restrictions on the language:

  • functions can only return either zero or one value, and
  • instruction sequences likeblock s,ifs, andloops cannot consume any stack values, and may only produce zero or one resulting stack value.

The multi-value proposalis an extension to the WebAssembly standard that lifts these arity restrictions. Under the new multi-value Wasm rules:

  • functions can return an arbitrary number of values, and
  • instruction sequences can consume and produce an arbitrary number of stack values.

The following snippets are only valid under the new rules introduced in the multi-value Wasm proposal:

;; A function that takes an `i 64 `and returns ;; Three `I 32 `s. func (param i 64) (result i  (i) ***************************************************************************************************************************************************************************** (i) )   ...)  ;; A loop that consumes an `i 32 `stack value ;; at the start of each iteration. loop (param i 32)   ... end  ;; A block that produces two `i 32 `stack values. block (result i  (i) )   ... end

The multi-value proposal is currently at phase 3 ofthe WebAssembly standardization process.

(But Why Should I Care?)

Code Size

There are a few scenarios where compilers are forced to jump through hoops when producing multiple stack values for core Wasm. Workarounds include introducing temporary local variables, and usinglocal.getandlocal.setinstructions, because the arity restrictions on blocks mean that the values ​​cannotbe left on the stack.

Consider a scenario where we are computing two stack values: the pointer to a string in linear memory, and its length. Furthermore, imagine we are choosing between two different strings (which therefore have different pointer-and-length pairs) based on some condition. But whichever string we choose, we’re going to process the string in the same fashion, so we just want to push the pointer-and-length pair for our chosen string onto the stack, and control flow can join afterwards.

With multi-value, we can do this in a straightforward fashion:

call $ compute_condition if (result i  (i) )   call $ get_first_string_pointer   call $ get_first_string_length else   call $ get_second_string_pointer   call $ get_second_string_length end

This encoding is also compact: only sixteen bytes!

When we’re targeting core Wasm, and multi-value isn’t available, we ‘re forced to pursue alternative, more convoluted forms. We can smuggle the stack values ​​out of eachifandelsearm via temporary local values:

;; Introduce a pair of locals to hold the values ;; across the instruction sequence boundary. (local $ string i 32) (local $ length i 32)  call $ compute_condition if   call $ get_first_string_pointer   local.set $ string   call $ get_first_string_length   local.set $ length else   call $ get_second_string_pointer   local.set $ string   call $ get_second_string_length   local.set $ length end  ;; Restore the values ​​onto the stack, from their ;; temporaries. local.get $ string local.get $ length

This encoding requires 30 bytes, an overhead of fourteen bytes more than the ideal multi-value version. And if we were computing three values ​​instead of two, there would be even more overhead, and the same is true for four values, etc… The additional overhead is proportional to how many values ​​we’re producing in theifandelseArms.

We can actually go a little smaller than that – still with core Wasm – by jumping through a different hoop. We can split this into twoif ... else ... endblocks and duplicate the condition check to avoid introducing temporaries for each of the computed values ​​themselves:

;; Introduce a local for the condition, so that ;; we only re-check it, and don't recompute it. (local $ condition i 32)  ;; Compute and save the condition. call $ compute_condition local.set $ condition  ;; Compute the first stack value. local.get $ condition if (result i 32)   call $ get_first_string_pointer else   call $ get_second_string_pointer end  ;; Compute the second stack value. local.get $ condition if (result i 32)   call $ get_first_string_length else   call $ get_second_string_length end

This gets us down to 28 bytes. Two fewer than the last version, but still an overhead of twelve bytes compared to the multi-value encoding. And the overhead is still proportional to how many values ​​we’re computing.

There’s no way around it: we need multi-value to get the most compact code here .

New Instructions

The multi-value proposal opens up the possibility for new instructions that produce multiple values:

    (An) ******************************************** (i) .Divmodinstruction of type[i32 i32] ->[i32 i32]that takes a numerator and divisor and produces (both) their quotient and remainder.

  • Arithmetic operations with an additional carry result. These could be used to better implement big ints, overflow checks, and saturating arithmetic.

Returning Small Structs More Efficiently

Returning multiple values ​​from functions will allow us to more efficiently return small structures like Rust’sResults. Without multi-value returns, these relatively small structs that still don’t fit in a single Wasm value type get placed in linear memory temporarily. With multi-value returns, the values ​​don’t escape to linear memory, and instead stay on the stack. This can be more efficient, since Wasm stack values ​​are generally more amenable to optimization than loads and stores from linear memory.

Interface Types

Shrinking code size is great, and new instructions would be fancy, but here’s what I ‘ m really excited about:WebAssembly interface types. Interface types used to be called “host bindings,” and they are the key to unlocking:

  • direct, optimized access to the browser’s DOM methods on the Web,
  • “shared-nothing linking” of WebAssembly modules , and
  • defining language-neutral interfaces, likeWASI.

For all three use cases, we might want to return a string from a callee Wasm module. The caller that is consuming this string might be a Web browser, or it might be another Wasm module, or it might be a WASI-compatible Wasm runtime. In any case, a natural way to return the string is as twoi 32S:

  1. a pointer to the start of the string in linear memory, and
  2. the byte length of the string.

The interface adapter can then lift that pair ofi 32s into an abstract string type, and then lower it into the caller’s concrete string representation on the other side. Interface types are designed such that in most cases, this lifting and lowering can be optimized into a quick memory copy from the callee’s linear memory to the caller’s.

But before the interface adapters can do that lifting and lowering, they need access to the pointer and length pair, which means the callee Wasm function needs to return two values, which means we need multi-value Wasm for interface types.

All The Implementing!

Now that we know what multi-value Wasm is, and why it’s exciting, I ‘ ll recount the tale of implementing support for it all over the place. I started with implementing multi-value support in the Rust and WebAssembly toolchain, and then I added support to the Wasmtime runtime, and the Cranelift code generator it’s built on top of.

(Rust and WebAssembly Toolchain)

What falls under the Rust and Wasm toolchain umbrella? It is a superset of the general Rust toolchain:

  • Cargo: Manages builds and dependencies.
  • RUSTC: Compiles Rust sources into code.
  • LLVM: Used byRUSTCunder the covers to optimize and generate code.

And then additionally, when targeting Wasm, we also use a few more moving parts:

  • wasm-bindgen: Part library and part Wasm post-processor,wasm-bindgengenerates bindings for consuming and producing interfaces defined with interface types (and much more!)
  • Walrus: A library for transforming and rewriting WebAssembly modules, used bywasm-bindgen ('s post-processor.)
  • Wasmparser: An event-style parser for WebAssembly binaries, used byWalrus.

Here’s a summary of the toolchain’s pipeline, showing the inputs and outputs between tools:

A diagram showing the pipeline of the Rust and Wasm toolchain.

My goal is to unlock interface types with multi-value functions. For now, I haven’t been focusing on code size wins from generating multi-value blocks. For my purposes, I only need to introduce multi-value functions at the edges of the Wasm module that talk to interface adapters; I don’t need to make all function bodies use the optimal multi-value instruction sequence constructs. Therefore, I decided to havewasm-bindgen‘s post-processor rewrite certain functions to use multi-value returns, rather than add support in LLVM.(0)With this approach I only needed to add support to the following tools:

  • Cargo
  • (rustc) ********************************************* ()
  • LLVM
  • wasm-bindgen
  • Walrus
  • Wasmparser

Wasmparser

Wasmparseris an event-style parser for WebAssembly binaries.It may seem strange that adding toolchain support for (generating) multi-value Wasm began withparsingmulti-value Wasm. But it is necessary to make testing easy and painless, and we needed it eventually for Wasmtime anyways, which also useswasmparser.

In core Wasm, the optional value type result of a block,loop, orifis encoded directly in the instruction:

    (a) ******************************************** (0x) BYTE means there is no result

  • a0x7fbyte means there is a singlei 32result
  • a0x7ebyte means there is a singlei 64result
  • etc…

With multi-value Wasm, there are not only zero or one resulting value types, there are also parameter types. Blocks can have the same set of types that functions can have. Functions already de-duplicate their types in the “Type” section of a Wasm binary and reference them via index. With multi-value, blocks do that now as well. But how does this co-exist with non-multi-value block types?

The index is encoded as a signed variable-length integer, usingthe LEB 128 encoding. If we interpret non-multi-value blocks’ optional result value type as a signed LEB 128, we get:

  • - 64(the smallest number that can be encoded as a single byte with signed LEB 128) means there is no result
  • - 1means there is a single (i) result
  • - 2means there is a single (i) result
  • etc ..

They’re all negative, leaving the positive numbers to be interpreted as indices into the “Type” section for multi-value blocks! A nice little encoding trick and bit of foresight from the WebAssembly standards folks.

Adding support for parsing these was straightforward, butWasmparseralso supports validating the Wasm as it parses it. Adding validation support was a little bit more involved.

Wasmparser‘s validation implementation is similar tothe validation algorithm presented in the appendix of the WebAssembly spec: it abstractly interprets the Wasm instructions, maintaining a stack of types, rather than a stack of values. If any operation uses operands of the wrong type – for example the stack has anf 32at its top when we are executing an (i) .addinstruction, and therefore expect two (i) s on top of the stack – then validation fails. If there are no type errors, then it succeeds. There are some complications when dealing with stack-polymorphic instructions, likedrop, but they don’t really interact with multi-value.

WheneverWasmparserencounters a (block) ,loop, or (if) instruction, it pushes an associated control frame, that keeps track of how deep in the stack instructions within this block can access. Before multi-value, the limit was always the length of the stack upon entering the block, because blocks didn’t take any values ​​from the stack. With multi-value, this limit becomesstack.len () - block.num_params (). When exiting a block, wasmparserpops the associated control frame. It check that the top (n) types on the stack match the block’s result types, and that the stack’s length isframe.depth n. Before multi-value, (nwas always either(0) or (1) , but now it can be any non- negative integer.

The final bit of validation that is impacted by multi-value is when an (if) needs to have anelseor not. In core Wasm, if theifdoes not produce a resulting value on the stack, it doesn’t need anelsearm since the wholeif‘s typing is[] ->[]which is also the typing for a no-op. With multi-value this is generalized to anyifwhere the inputs and outputs are the same types:[t*] ->[t*]. Easy to implement, but also very easy to overlook (like I originally did!)

Multi-value support was added toWasmparserin these pull requests:

Walrus

walrusis a WebAssembly to WebAssembly transformation library.We use it to generate glue code inwasm-bindgenand to polyfill WebAssembly features.

Walrusconstructs its own intermediate representation (IR) for WebAssembly. Similar to howwasmparservalidates Wasm instructions ,walrusalso abstractly interprets the instructions while building up its IR. This meant that adding support for constructing multi-value IR towalruswas very similar to adding multi-value validation support towasmparser. In fact,also validates the Wasm while it is constructing its IR.

But multi-value has big implications for the IR itself. Before multi-value, you could view Wasm’s stack-based instructions as a post-order encoding of an expression tree.

Consider the expression(a b) * (c - d). As an expression tree, it looks like this:

*     /     /      -  /  /  a b c d

A post-order traversal of a tree is where a node is visited after its children. A post-order traversal of our example expression tree would be:

AB   CD - *

Assume thata, (B) , (C) , and (d) are Wasm locals of type (i) , with the values ​​ (9) , (7) , (5) , and (3) respectively. We can convert this post-order directly into a sequence of Wasm instructions that build up their results on the Wasm stack:

;; Instructions ;; Stack ;;;;;;;;;;;;;;;;;;;;;;;;;;;;; local.get $ a ;; [9] local.get $ b ;; [9, 7] i 32 .add ;; [16] local.get $ c ;; [16, 5] local.get $ d ;; [16, 5, 3] i 32 .sub ;; [16, 2] i 32 .mul ;; [32]

This correspondence between trees and Wasm stack instructions made using a tree-like IR inWalrus, where nodes are instructions and a node’s children are the instructions that produce the parent’s input values, very natural.(1)Our IR used to look something like this:

pub enum Expr {     // A `local.get` instruction.     LocalGet (LocalId),      // An `i 32 .Add` instruction.     I 32 Add (ExprId, ExprId),      // Etc ... }

But multi-value threw a wrench in this tree-like representation: now that an instruction can produce multiple values, when we have aparent⟶childedge in the tree, how do we knowwhichof the child’s resulting values ​​the parent wants to use? And also, if two different parents are each using one of the two values ​​an instruction generates, we fundamentally don’t have a tree anymore, we have adirected, acyclic graph (DAG).

We considered generalizing our tree representation into a DAG, and labeling edges with (n) to represent using the (n) ************************************************************************* (th) resulting value of an instruction. We weighed the complexity of implementing this representation against what our current use cases inwasm-bindgendemand, along with any future use cases we could think of. Ultimately, we decided it wasn’t worth the effort, since we don’t need that level of detail for any of the transformations or manipulations thatwasm-bindgenperforms, or that we foresee it doing in the future.

Instead, we decided that within a block, representing instructions as a simple list is good enough for our use cases, so now our IR looks something like this:

pub struct Block (Vec) ;  pub enum Instr {     // A `local.get` instruction.     LocalGet (LocalId),      // An `i 32 .Add` instruction. Note that its     // children are left implicit now.     I 32 Add,      // Etc ... }

Additionally, it turns out it is faster to construct and traverse this list-based representation, so switching representations inwalrusalso gavewasm-bindgena nice little speed Up.

TheWalrussupport for multi-value was implemented in these pull requests:

wasm-bindgen

wasm-bindgenfacilitates high-level interactions between Wasm modules and their host.Often that host is a Web browser and its DOM methods, or some user-written JavaScript. Other times it is an outside-the-Web Wasm runtime, like Wasmtime, using WASI and interface types.wasm-bindgenacts as a polyfill for the interface types proposal, plus some extra batteries included for a powerful user experience.

One ofwasm-bindgen‘s responsibilities is translating the return value of a Wasm function into something that the host caller can understand . When using interface types directly with Wasmtime, this means generating interface adapters that lift the concrete Wasm return values ​​into abstract interface types. When the caller is some JavaScript code on the Web, it means generating some JavaScript code to convert the Wasm values ​​into JavaScript values.

Let’s take a look at some Rust functions and the Wasm they get compiled down into.

First, consider when we are returning a single integer from a Rust function:

// random.rs  # [no_mangle] pub extern "C" fn get_random_int () ->i 32 {     // Chosen by fair dice roll.     4 }

And here is the disassembly of that Rust code compiled to Wasm:

;; random.wasm  (func $ get_random_int (result i 32)   i 32 const 4 )

The resulting Wasm function’s signature is effectively identical to the Rust function’s signature. No surprises here. It is easy forwasm-bindgento translate the resulting Wasm value to whatever is needed becausewasm-bindgenhas direct access to it; it’s right there.

Now let’s look at returning compound structures from Rust that don’t fit in a single Wasm Value:

// pair.rs  # [no_mangle] pub extern "C" fn make_pair (a: i 32, b: i 32) ->[i32; 2] {     [a, b] }

And here is the disassembly of this new Rust code compiled to Wasm:

;; pair.wasm  (func $ make_pair (param i 32 i 32 i 32)   local.get 0   local.get 2   i 32 .store offset=4   local.get 0   local.get 1   i 32 )

The signature for themake_pairfunction inpair.wasmdoesn’t look like its corresponding signature inpair.rs! It has three parameters instead of two, and it isn’t returninganyvalues, let alone a pair.

What’s happening is that LLVM doesn’t support multi-value yet so it can’t return two (i) s directly from the function. Instead, callers pass in a “struct return” pointer to some space that they’ve reserved for the return value, andmake_pairwill write its return value through that struct return pointer into the reserved space. By convention, LLVM uses the first parameter as the struct return pointer, so the second Wasm parameter is our original (a) parameter in Rust and the third Wasm parameter is our original (B) parameter in Rust. We can see that the Wasm function is writing theBfield first, and then theAfield second.

How is space reserved for the struct return? Distinct from the Wasm standard’s stack that instructions push values ​​to and pop values ​​from, LLVM emits code to maintain a “shadow stack” in linear memory. There is a global dedicated as the stack pointer, and always points to the top of the stack. Non-leaf functions that need some scratch space of their own will decrement the stack pointer to allocate some space on entry (since the stack grows down, and its “top” of the stack is its lowest address) and will increment it to deallocate that space on exit. Leaf functions that don’t call any other function can skip incrementing and decrementing this stack pointer, which is exactly why we didn’t seemake_pairmessing with the stack pointer.

To verify that callers are allocating space for the return struct in the shadow stack, let’s create a function that callsmake_pairand then inspect its disassembly:

// pair.rs  # [no_mangle] pub extern "C" fn make_default_pair () ->[i32; 2] {     make_pair (42, 1337) }

I’ve annotateddefault_make_pair‘s disassembly below to make it clear how the shadow stack pointer is manipulated to create space for return values ​​and how the pointer to that space is passed tomake_pair:

;; pair.wasm  (func $ make_default_pair (param i 32)   (local i 32)    ;; Reserve 16 bytes of space in the shadow   ;; stack. We only need 8 bytes, but LLVM keeps   ;; the stack pointer 16 - byte aligned. Global 0   ;; is the stack pointer.   global.get 0   i 32 Const 16   i 32   local.tee 1   global.set 0    ;; Get a po inter to the last 8 bytes of our   ;; shadow stack space. This is our struct   ;; return pointer argument, where the result   ;; of `make_pair` will be written to.   local.get 1   i 32 const 8   i 32    ;; Call `make_pair` with the struct return   ;; pointer and our default values.   i 32 Const 42   i 32 Const 1337   call $ make_pair    ;; Copy the resulting pair into our own struct   ;; return pointer's space. LLVM optimized this   ;; into a single `i 64 `load and store, instead   ;; of two `i 32 `load and stores.   local.get 0   local.get 1   i 64 .load offset=8   i 64 .store align=4    ;; Restore the stack pointer to the original   ;; value it had upon entering this function,   ;; deallocating our shadow stack frame.   local.get 1   i 32 Const 16   i 32   global.set 0   end )

When the caller is JavaScript,wasm-bindgencan use its knowledge of these calling conventions to generate JavaScript glue code that allocates shadow stack space, calls the function with the struct return pointer argument, reads the values ​​out of linear memory, and finally deallocates the shadow stack space before converting the Wasm values ​​into some JavaScript value.

But when using interface types directly, rather than polyfilling them, we can’t rely on generating glue code that has access to the Wasm module’s linear memory. First, the memory might not be exported. Second, the only glue code we have is interface adapters, not arbitrary JavaScript code. We want those values ​​as proper return values, rather than through a side channel.

So I wrote aWalrustransform inwasm-bindgenthat converts functions that use a struct return pointer parameter without any actual Wasm return values, into multi-value functions that don’t take a struct return pointer parameter but return multiple resulting Wasm values ​​instead. This transform is essentially a “reverse polyfill” for multi-value functions.

;; Before. ;; ;; First parameter is a struct return pointer. No ;; return values, as they are stored through the ;; struct return pointer. (func $ make_pair (param i 32 i 32 i 32)   ;; ... )  ;; After. ;; ;; No more struct return pointer parameter. Return ;; values ​​are actual Wasm results. (func $ make_pair (param i 32 i 32) (result i  (i) )   ;; ... )

The transform is only applied to exported functions that take a struct return pointer parameter, and rather than rewriting the source function in place, the transform leaves it unmodified but removes it from the Wasm module’s exports list. It generates a new function that replaces the old one in the Wasm module’s exports list. This new function allocates shadow stack space for the return value, calls the original function, reads the values ​​out of the shadow stack onto the Wasm value stack, and finally deallocates the shadow stack space before returning.

For our runningmake_pairexample, the transform produces an exported wrapper function like this:

;; pair.wasm  (func $ new_make_pair (param i 32 i 32) (result i  (i) )   ;; Our struct return pointer that points to the   ;; scratch space we are allocating on the shadow   ;; stack for calling `$ make_pair`.   (local i 32)    ;; Allocate space on the shadow stack for the   ;; result.   global.get $ shadow_stack_pointer   i 32 const 8   i 32   local.tee 2   global.set $ shadow_stack_pointer    ;; Call the original `$ make_pair` with our   ;; allocated shadow stack space for its   ;; results.   local.get 2   local.get 0   local.get 1   call $ make_pair    ;; Read the return values ​​out of the shadow   ;; stack and place them onto the Wasm stack.   local.get 2   i 32   local.get 2   i 32 .load offset=4    ;; Finally, restore the shadow stack pointer.   local.get 2   i 32 const 8   i 32   global.set $ shadow_stack_pointer )

With this transform in place,wasm-bindgencan now generate multi-value function exports along with associated interface adapters that lift the concrete Wasm return values ​​into abstract interface types.

The multi-value support and transform were implemented inwasm-bindgenin these pull requests:

(Wasmtime and Cranelift)

Ok, so at this point, we can generate multi-value Wasm binaries with the Rust and Wasm toolchain – woo! But now we need to be able to run these binaries.

Enter (Wasmtime) , the WebAssembly runtime built on top ofthe Cranelift code generator. Wasmtime translates WebAssembly into Cranelift’s IR with theCranelift-wasmcrate, and then Cranelift compiles the IR down to native machine code.

Pipeline from a Wasm binary through Wasmtime and Cranelift to native code

Implementing multi-value Wasm support in Wasmtime and Cranelift roughly involved two steps:

  1. Translating multi- value Wasm into Cranelift IR
  2. Supporting arbitrary numbers of return values ​​in Cranelift

Translating Multi-Value Wasm into Cranelift IR

Cranelift has its own intermediate representation that it manipulates, optimizes, and legalizes before generating machine code for the target architecture. In order for Cranelift to compile some code, you need to translate whatever you’re working with into Cranelift’s IR. In our case, that means translating Wasm into Cranelift’s IR. This process is analogous to (rustc) converting its mid-level intermediate representation (MIR) to LLVM’s IR.(2)

Cranelift’s IR is made up of (extended) basic blocks(3)containing code insingle, static-assignment form (SSA). SSA, as the name implies, means that variables can only be assigned to when defined, and can’t ever be re-assigned:

;; `` 42   1337 `in Cranelift IR v0=iconst. 32 42 v1=iconst. 32 1337 v2=iadd v0, v1

When translating to SSA form, most re-assignments to a variable (x) can be handled by defining a freshx1and replacing subsequent uses of (x) withx1, and then turning the next re-assignment intox2, etc. But that doesn’t work for points where control flow joins, such as the block following the consequent and alternative arms of an (if) /else.

Consider this Rust code, and how we might translate it into SSA:

let x; if some_condition () {     // This version of `x` becomes` x0` when     // translated to SSA.     x=foo (); } else {     // This version of `x` becomes` x1` when     // translated to SSA.     x=bar (); } // Does this use `x0` or` x1` ?? do_stuff (x);

Should thedo_stuffcall at the bottom use (x0) or (x1) when translated into SSA? Neither!

SSA uses Φ (phi) functions to handle these cases. A phi function takes a number of mutually exclusive, control flow-dependent parameters and returns the one that was defined where control flow came from. In our example we would havex2=Φ (x0, x1), and ifsome_condition ()was true then (x2) would get its value fromx0. Otherwise, x2) would get its value fromx1.

If SSA and phi functions are new to you and you’re feeling confused, don ‘ t worry! It was confusing for me too when I first learned about this stuff. But Cranelift IR doesn’t use phi functions per se, it has something that I think is more intuitive: blocks can have formal parameters.

Translating our example to Cranelift IR, we get this:

;; Head of the `if` /` else`. ebb0:     v0=call some_condition ()     brnz v0, ebb1     jump ebb2  ;; Consequent. ebb1:     v1=call foo ()     jump ebb3 (v1)  ;; Alternative. ebb2:     v2=call bar ()     jump ebb3 (v2)  ;; Code following the `if` /` else`. ebb3 (v3: i 32):     call do_stuff (v3)

Note thatebb3takes a parameter for the control flow-dependent value that we should pass todo_stuff! And the jumps inebb1and (EBB2) pass their locally-defined values ​​“ into ” (ebb3) ! This is equivalent to phi functions, but I find it much more intuitive.

Anyways, translating WebAssembly code into Cranelift IR happens in theCranelift-wasmcrate. It useswasmparserto decode the given blob of Wasm and validate it, and then constructs Cranelift IR via (you guessed it!) abstract interpretation. ASCranelift-wasminterprets Wasm instructions , rather than pushing and popping Wasm values, it maintains a stack of Cranelift IR SSA values. ASCranelift-wasmenters and exits Wasm control frames, it creates Cranelift IR basic blocks.

This process is fairly similar toWalrus‘s IR construction, which was pretty similar toWasmparser‘s validation, and the whole thing felt pretty familiar by now. There were just a couple tricky bits.

The first tricky bit was remembering to add parameters (phi functions) to the first basic block for a Wasmloop‘s body , representing its Wasm stack parameters. This is necessary, because control flow joins from two places at the top of the loop body: from where we were when we first entered the loop, and from the bottom of the loop when we finish an iteration and are starting another. In terms of the abstract interpretation, this means you need to pop off the particular SSA values ​​you have on the stack at the start of the loop, construct SSA values ​​for the loop’s parameters, and then push those onto the stack instead. I originally overlooked this, resulting in a fair bit of head scratching and debugging mis-translated IR. Whoops!

Second,Cranelift-wasmwill track reachability during translation, and if some Wasm code is unreachable, we don’t even bother constructing Cranelift IR for it. But that boundary between unreachable and reachable code, and when one transitions to the other, can be a bit subtle. You can be in an unreachable state, fall through the current block into the following block, and become reachable once again. Throw in if s withelses, and (if) s withoutelses, and unconditional branches, and early returns, and it is easy for bugs to sneak in. And in the process of adding multi-value Wasm support, bugs did, in fact, sneak in. This time involving anifthat was initially reachable, and whose consequent arm also ends reachable, but whose alternative arm endsunreachable. Given that, should the block following the consequent and alternative be reachable? Yes, but we were incorrectly computing that it shouldn’t be.

To fix this bug, I refactored howCranelift-wasmcomputes reachablity of code following anif. It now correctly determines that the following block is reachable if the head of theifis reachable and any of the following are true:

  • The consequent or alternative end reachable, in which case they will continue to the following block.
  • The consequent or alternative do an early branch ( potentially a conditional branch) to the following block, and that branch is reachable.
  • There is no alternative, so if the (if) ‘s condition is false, we go directly to the following block.

To be sure that we are handling all these edge cases correctly, I added tests enumerating every combination of reachability of anif‘s arms as well as early branches . Phew!

Finally, this bug first manifested itself in a 39 KiB Wasm file, and figuring out what was going on was made so much easier thanks to tools likewasm-reduce(a tool that is part ofbinaryen) andcreduce(working on the WAT disass embly, rather than the binary Wasm). I forget which one I used this time, but I’ve successfully used both to turn big, complicated Wasm test cases into small, isolated test cases that highlight the bug at hand. These tools are real life savers so it is worth broadcasting their existence just in case anyone doesn’t know about them!

Translating multi-value Wasm into Cranelift IR happened in these pull requests:

Supporting Many Return Values in Cranelift

Cranelift IR (the language) supports returning arbitrarily many values ​​from a function, but Craneliftthe implementationonly supported returning as many values ​​as there are available registers in the calling convention that the function is using. For example, with the System V calling convention, you could return up to three pointer-sized values, and with the Windows fastcall calling convention, you could only return a single pointer-sized value.

So the question was:

  How to return more values ​​than can fit in registers?

This should trigger some deja vu: when compiling to Wasm, how was LLVM returning structures larger than could fit in a single Wasm value? Struct return pointer parameters! This is nothing new, and in fact its use is dictated by certain calling conventions, we just hadn’t implemented support for it in Cranelift yet. So that’s what I set out to do.

When Cranelift is given some initial IR, the IR is generally portable and machine independent. As the IR moves through Cranelift, eventually it reaches a legalization phase where instructions that don’t have a direct mapping to an machine code instruction in the target architecture are replaced with ones that do. For example, on 32 – bit x 86, Cranelift legalizes 64 – bit arithmetic by expanding it into a series of 32 – bit operations. During this process, we also legalize function signatures: passing a value that is larger than can fit in a register may need to be split into multiple parameters, each of which can fit in registers, for example. Signature legalization also assigns locations to formal parameters based on the function’s calling convention: this parameter should be in this register, and that parameter should be at this stack offset, etc.

My plan for implementing arbitrary numbers of return values ​​via struct return pointer parameters was to hook into Cranelift’s legalization phase during signature legalization, legalizingreturninstructions, and legalizingcallinstructions.

When legalizing signatures, we need to determine whether a struct return pointer is required, and if so, update the signature to reflect that.

;; Before legalization. fn () ->i 64, i 64, i 64, i 64 fast  ;; After legalization. fn (v0: i 64 sret [%rdi]) ->i 64 sret [%rax] fast

Here,Fastmeans the signature is using our internal, unstable “fast” calling convention. Thesretis an annotation for a parameter or return value, in this case documenting that it is being used as a struct return pointer. The% rdiand (% rax) are the registers assigned to the parameter and return value by the calling convention.4

After legalization, we’ve added the struct return pointer parameter, but we also removed the old returns, and we also return the struct return pointer parameter as well. Returning the struct return pointer is mandated by the System V ABI’s calling conventions, but we currently do the same thing for our internal, unstable calling convention as well.

After signatures are legalized, we need to legalizecallandreturninstructions as well, so that they match the new, legalized signatures. Let’s turn our attention to the latter first.

Legalizing areturninstruction removes the return values ​​from thereturninstruction itself, and creates a series of precedingstoreinstructions that write the return values ​​through the struct return pointer. Here’s an example that is returning four (i) values:

;; Before legalization. ebb0:     ;; ...     return v0, v1, v2, v3  ;; After legalization. ebb0 (v4: i 64):     ;; ...     store notrap aligned v0, v4     store notrap aligned v1, v4   4     store notrap aligned v2, v4   8     store notrap aligned v3, v4   12     return v4

The newV4value is the struct return pointer parameter. Thenotrapannotation on thestoreinstruction is saying that this store can ‘ t trigger a trap. It is the caller’s responsibility to give us a valid struct return pointer that is pointing to enough space to fit all of our return values. Thealignedannotation is similar, saying that the pointer we are storing through is properly four-byte aligned for an (i) . Again, the responsibility is on the caller to ensure the struct return pointer has at least the maximum alignment required by the return values ​​’types. The 4, 8, and 12are static immediates that specify an offset to be added to the actual (V4) operand to compute the destination address for the store.

Legalizing acallinstruction has comparatively more responsibilities than legalizing areturnInstruction. Yes, it involves adding the struct return pointer argument to thecallinstruction itself, and then loading the values ​​out of the struct return space after the callee returns to us. But it additionally must allocate the space for the struct return in the caller function’s stack frame, and it must ensure that the size and alignment invariants that the callee and itsreturninstructions rely on are upheld.

Let’s take a look at an example of some caller function calling a callee function that returns fouri 32s:

;; Before legalization. function% caller () {     fn0=colocated% callee () ->i 32, i 32, i 32, i 32  ebb0:     v0, v1, v2, v3=call fn0 ()     return }  ;; After legalization. function% caller () {     ss0=sret_slot 16     sig0=(i 64 sret [%rdi]) ->i 64 sret [%rax] fast     fn0=colocated% callee sig0  ebb0:     v4=stack_addr.i 64 ss0     v5=call fn0 (v4)     v6=load.i 32 notrap aligned v5     v0 ->v6     v7=load.i 32 notrap aligned v5   4     v1 ->v7     v8=load.i 32 notrap aligned v5   8     v2 ->v8     v9=load.i 32 notrap aligned v5   12     v3 ->v9     return }

Thess0=sret_slot 16is a sixteen byte stack slot that we created for the struct return space. It is also aligned to sixteen bytes, which is greater than necessary in this case, since we only need four byte alignment for the (i) s. Similar to thestores in the legalizedreturn, theloads in the legalizedcallare also annotated withNotrapandaligned.v0 ->v6establishes that (v0) is another name for (V6) , and we don’t have to eagerly rewrite all the following uses ofv0into uses ofv6(even though there don’t happen to be any in this particular example).

With signature,call, andreturnlegalization that all understand when and how to use struct return pointers, we now have full support for arbitrarily many multi-value returns in Cranelift and Wasmtime. This support was implemented in these pull requests:

Putting It All Together

Finally, let’s put everything together and create a multi-value Wasm binary with the Rust and Wasm toolchain and then run it in Wasmtime!

First, let’s create a new library crate withCargo:

$ cargo new --lib hello-multi-value Created library `hello-multi-value` package $ cd hello-multi-value /

We’re going to usewasm-bindgento return a string from our Wasm function, so lets add it as a dependency . Additionally, we’re going to create a Wasm library, rather than an executable, so specify that this is a “cdylib”:

# Cargo.toml  [lib] crate-type=["cdylib"]  [dependencies] wasm-bindgen="0.2. 54 "

Let’s fill outsrc / lib.rswith our string-returning function:

use wasm_bindgen :: prelude: : *;  # [wasm_bindgen] pub fn hello (name: & str) ->String {     format! ("Hello, {}!", name) }

We can build our Wasm library withCargo Wasi:

$ cargo wasi build - release

This will automatically build a Wasm file for theWASM 32 - wasitarget and then runwasm-bindgen‘s post-processor to add interface types and introduce multi-value. We can verify this with thewasm-objdumptool fromWABT:

$ wasm-objdump -x      target / wasm 32 - wasi / release / hello_multi_value.wasm hello_multi_value.wasm: file format wasm 0x1  Section Details:  Type [14]: ... - type [6] (i 32, i 32) ->(i 32, i 32) ...  Function [151]: ... - func [93] sig=6 ...  Export [5]: - func [93] ->"hello" ...

We can see that thefunction is exported as `` hello '' and that it has the multi-value type `(i 32, i 32) ->(i 32, i 32) `. This shim function is indeed the one introduced by our multi-value transform we added to `wasm-bindgen` to wrap the originalfunction and turn its struct return pointer into multi-value.

Finally, we can load this Wasm library into Wasmtime, which will use Cranelift to just- in-time (JIT) compile it to machine code, and then invoke thehelloexport with the string"multi-value Wasm":

$ wasmtime      target / wasm 32 - wasi / release / hello_multi_value.wasm      --invoke hello "multi-value Wasm" Hello, multi-value Wasm!

It works !!

Conclusion

The Rust and WebAssembly toolchain now supports generating Wasm binaries that make use of the multi-value proposal, and Cranelift and Wasmtime can compile and run multi-value Wasm binaries. This has been – I hope! – an interesting tale of implementing a Wasm feature through the whole vertical ecosystem, start to finish.

Lastly, and definitely not leastly, I’d like to thankDan Gohman, Benjamin Bouvier , Alex Crichton ,Yury Delendik,@ bjorn3, and@ iximeowfor providing reviews and implementation suggestions for different pieces of this journey at various stages. Additionally, thanks again to Alex and Dan, and to (Lin Clark) andTill Schneidereitfor all providing feedback on early drafts of this piece.


Additionally,Thomas Livelyand some other folks are already working on adding multi-value Wasm support directly to LLVM, so that is definitely coming in the future, and it made sense for me to focus my attention elsewhere.

(1) There are some “stack-y” forms that don’t quite directly map to a tree. For example, you can insert stack-neutral, side-effectual instruction sequences in the middle of any part of the post-order encoding of an expression tree. Here is acallthat produces some value, followed by a (drop) ********************************************* that value, inserted into the middle of the post-order encoding of1 2:

;; Instructions ;; Stack ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; i 32 .const 1 ;; [1] call $ foo ;; [1, $result_of_foo] drop ;; [1] i 32 .cor 2 ;; [1, 2] i 32 .add ;; [3]

These stack- y forms can be represented by introducing blocks that don’t also introduce labels for control flow branches. You can think of them as sort of similar to Common Lisp’sprognandprog0forms or Scheme’s(begin ...)(↩)

Fun fact: there is also ongoing work to make Cranelift a viable alternative backend forRUSTC! See thegoals write upand thebjorn3 / rustc_codegen_craneliftrepo for details. (↩)

Originally, Cranelift was designed to use extended basic blocks, rather than regular basic blocks. Both can only be entered at their head, but basic blocks additionally can only exit at their tail, while extended basic blocks can have conditional exits from the block in their middle. The idea is that extended basic blocks more directly match machine code which falls through untaken conditional branches to continue executing the next instruction. However, Cranelift is in the process of switching over to regular basic blocks, and removing support for extended basic blocks. The reasoning is that all its optimization passes end up essentially constructing and keeping track of basic blocks anyways, which added complexity, and the extended basic blocks weren’t ultimately carrying their weight.

(4) Semi-confusingly, the square brackets are just the syntax that Cranelift decided to use to surround parameter locations, and they donotrepresent dereferencing the way they would in Intel-flavored assembly syntax.

    

I like computing, bicycles, hiphop, books, and pen plotters. My pronouns are he / him.

                    

More articles by Nick Fitzgerald…

                  

Brave Browser
(Read More)Payeer

What do you think?

Leave a Reply

Your email address will not be published. Required fields are marked *

GIPHY App Key not set. Please check settings

Half-Life: Alyx, Hacker News

Half-Life: Alyx, Hacker News

Government formation in last lap, NCP-Congress to talk power-sharing with Shiv Sena today – Times of India, The Times of India

Government formation in last lap, NCP-Congress to talk power-sharing with Shiv Sena today – Times of India, The Times of India