in ,

Build your own WebAssembly Compiler, Hacker News

Build your own WebAssembly Compiler, Hacker News

[loop condition]         [nested statements] Have you ever wanted to write your own compiler? … Yes? … of course you have! I’ve always wanted to have a go at writing a compiler, and with the recent release of WebAssembly, I had the perfect excuse to have a go.

My original plan was to

invent my own programming language, create a compiler that targets WebAssembly, and share my experiences at (FullStackNYC) . The first part went to plan, I spent many-an-evening building, tinkering and refining my compiler. Unfortunately the last part of my plan didn’t go quite so well. Long delays, and an eventual cancellation [ { “type”: “printStatement”, “expression”: { “type”: “numberLiteral”, “value”: 23.1 } }] , meant that I was not going to make it to New York after all. 😔😢😭

So, rather than waste all that work – I thought I’d write up my talk as a blog post instead – hence the ‘ min ‘reading time for this article. So sit back, make yourself comfortable, and we’ll begin…

What is WebAssembly? (and why does it exist?)

If you haven’t heard of WebAssembly before, and want a really detailed introduction, I’d thoroughly recommend Lin Clark’s Cartoon Guide [{ type: “printStatement”, expression: { type: “binaryExpression”, left: { type: “binaryExpression”, left: { type: “numberLiteral”, value: 42 }, right: { type: “numberLiteral”, value: 10 }, operator: ” “ }, right: { type: “numberLiteral”, value: 2 }, operator: “/” }}] .

You’ll learn the ‘what’ of WebAssembly throughout this blog post, but I do want to briefly touch on the ‘why’ .

From my perspective, this diagram sums it up quite succinctly:

The top diagram shows a simplified timeline for the execution of some JavaScript code within the browser. From left-to-right, the code (typically delivered as a minified mess!) Is parsed into an AST, initially executed in an interpreter, then progressively optimized / re-optimized until it eventually runs really quite quickly. These days JavaScript is fast – it just takes a while to get up to speed.

The bottom diagram is the WebAssembly equivalent. Code written in a wide variety of languages ​​(Rust, C, C #, etc …) is compiled to WebAssembly that is delivered in a binary format. This is very easily decoded, compiled and executed – giving fast and predictable performance.

So why write your own compiler? WebAssembly has been causing quite a stir over the last year. So much so, that it was voted the fifth ‘most loved’ language in Stack Overflow’s developer insights survey .

An interesting result, considering that for most people WebAssembly is a compilation target, rather than a language they will use directly.

This was part of my motivation for proposing the FullStackNYC talk in the first place. The technical aspects of WebAssembly are really fascinating (and remind me of 8-bit computers from a few decades back), yet most people will never get the chance to dabble with WebAssembly itself – it will just be a black box that they compile to.

Writing a compiler is a really good opportunity to delve into the details of WebAssembly to find it what it is and how it works. And it’s fun too!

One final point, it was never my aim to create a fully-featured programming language, or one that is actually any good. My goal was to create ‘enough’ of a language to allow me to write a program that renders a mandelbrot set. This language is compiled to WebAssembly using my compiler, which is written in TypeScript and runs in the browser.

Here it is in it’s full glory:

[loop condition]

I ended up calling the language [loop condition] chasm and you can

play with it online if you like .

Enough rambling – time for some code! A minimal wasm module

Before tackling the compiler, we’ll start with something simpler, creating a minimal WebAssembly module.

Here is an emitter (the term used for the part of a compiler that outputs instructions for the target system), that creates the smallest valid WebAssembly module:

[loop condition] [loop condition] (const) magicModuleHeader =[0x00, 0x61, 0x73, 0x6d]; const (moduleVersion) (=[0x01, 0x00, 0x00, 0x00]; export (const) (emitter) : Emitter =

()

=>     Uint8Array  . (from)  ([    ...magicModuleHeader,    ...moduleVersion  ]);   
It is comprised of two parts, the 'magic' header, which is the ASCII string

0asm , and a version number. These eight bytes form valid WebAssembly (or wasm) module. More typically these would be delivered to the browser as a . wasm file.

In order to execute the WebAssembly module it needs to be instantiated as follows: [loop condition] [loop condition] (const) wasm =(emitter) ();

const (instance) (=(await) WebAssembly . (instantiate) ( (wasm) )

If you run the above you'll find that (instance) does actually do anything because our wasm module does not contain any instructions!

If you're interested in trying out this code for yourself, it is all on GitHub - with a commit for each step

An add function

Let's make the wasm module do something more useful, by implementing a function that adds a couple of floating point numbers together.

WebAssembly is a binary format, which isn't terribly readable (to humans at least), which is why you'll more typically see it written in WebAssembly Text Format (WAT). Here's a module, presented in WAT format, that defines an exported function named ($ add) that takes two floating point parameters, adds them together and returns them:

(module  (func $ add (param f (param f [loop condition] ) (result f 069    get_local 0    get_local 1    f . add)  (export "add" (func 0)) )

If you just want to experiment with WAT you can use the (wat2wasm) (tool from the WebAssembly Binary Toolkit to compile WAT files into wasm modules.

The above code reveals some interesting details around WebAssembly -

WebAssembly is a low-level language, with a small (approx 82 instruction set, where many of the instructions map quite closely to CPU instructions. This makes it easy to compile wasm modules to CPU-specific machine code.