Recently, I read a blog post in which Ben Karel summarized the reaction to a request John Regehr made about how to teach compilers , and as one might predict, the Internet was in agreement that the answer was “absolutely everything”. Basically, everyone has a different perspective on what the most important thing is, and so the union of everyone’s most important thing is everything.
In fact, from a certain point of view, I don’t disagree. From parsing to typechecking to intermediate representations to dataflow analysis to register allocation to instruction selection to data representations to stack layouts to garbage collectors to ABIs to debuggers, basically everything in compilers is packed to the gills with absolutely gorgeous algorithms and proofs. The sheer density of wonderful stuff in compilers is just out of this world.
However, John specified an introductory course. Alas, this makes “everything” the wrong answer – he is asking for a pedagodyfirst answer which fits coherently into a smallish number of lectures. This means that we have to start with the question of what we want the students to learn, and then build the curriculum around that.
So, what should students have learned after taking a compilers class?
The obvious (but wrong) answer is “how to write a compiler”, because in all likelihood they will forget most of the techniques for implementing compilers soon after the course ends. This is because most of them – unless they catch the compiler bug – will not write very many more compilers. So if we want them to come away from a compilers class with some lasting knowledge, we have to think in terms of deeper programming techniques which (a) arise naturally when writing compilers, but (b) apply in more contexts than just compilers.
There are two obvious candidates for such techniques:
 Computing values least fixed points of monotone functions by bottomup iteration.

In particular, the compiler structure should be very opinionated.
The thing we want to do is to present a variety of options (LR versus LL parsing, graph coloring versus linear scan register allocation, CPS versus SSA for IRs, etc), and then present their strengths and weaknesses, and the engineering and design considerations which will lead you to favor one choice over the other.
But for this course, we won’t do any of that. As an instructor, we will simply choose the algorithm, and in particular choose the one that most benefits from fixed point computations.
The existence of alternatives should be gestured at in lecture, but the studentfacing explanation for why we are not exploring them is for their first compiler we are aiming for good choices but biased much more towards implementation simplicity than gcc or LLVM will. (This will be an easier pitch if there is an advanced compilers course, or if students have a final year project where they can write a fancy compiler.)

We will also need to bite the bullet and deemphasize the aspects of compilation where fixed point computations are less relevant.
Therefore, we will not cover runtime systems and data structure layouts in any great detail. This substantially affects the design of the language to compile – in particular, we should not choose a language that has closures or objects. Furthemore, we will just tell students what stack layout to use, and memory management wil be garbage collection via the Boehm gc.
 Integers int
 Unit type
unit
 Tuples T_1 … T_n

Sum types via datatype declarations:
datatype list a=Nil  Cons of a list a
 Computing values by recursion over inductivelydefined structures.
Both of these are fundamental algorithmic techniques. However, I think that recursion over inductive structure is much more likely to be taught outside of a compilers course, especially considering that modern functional programming (ie, datatypes and pattern matching) is now much more often taught elsewhere in the curriculum than it used to be.
This leaves fixedpoint computations.
So let’s choose the goal of an intro compilers class to be teaching fixed point computations many times over, in many different contexts, with the idea that the general technique of fixed point computation can be learned via generalizing from many specific examples. The fact that they are building a compiler is significant primarily because it means that the different examples can all be seen in a motivating context.
So, this drives some choices for the design of the course.
Our goal is to show how fixed point computations arise in many different contexts, and the stages of the compilation pipeline naturally supply a noncontrived collection of very differentlooking contexts. Consequently, we should design the course in a full fronttoback style, covering all the phases of the compilation pipeline.
However, this style of course has the real weakness that it can result in shallow coverage of each topic. Mitigating this weakness drives the design of the rest of the course.
For the rest of this note, I’ll call the language to compile Introl, because I always liked the Algol naming convention. Introl is a firstorder functional language, basically what you get if you removed nested functions, lambda expressions, references, and exceptions from OCaml. I’ve attached a sketch of this language at the end of the post.
This language has two features which are “hard” – polymorphism and datatypes. The reason for the inclusion of polymorphism is that it makes formulating type inference as a fixed point problem interesting, and the reason datatypes exist is because (a) match statements offer interesting control flow, and (b) they really show off what sparse conditional constrant propagation can do.
So the course will then follow the topdown lexing to code generation approach that so many bemoan, but which (in the context of our goals) is actually totally appropriate.
Lexing and Parsing
For Forexing, I would start with the usual regular expressions and NFAs thing, but then take a bit of a left turn. First, I would show them that state set sizes could explode, and then introduce them to BrüggemanKlein and Derick Wood’s deterministic regular expressions as a way of preventing this explosion.
The reason for this is that the conditions essentially check whether a regular expression is parseable by recursive descent without backtracking – i.e., you calculate NULLABLE, FIRST and (a variant of) FOLLOW sets for the regular expression. This lets you explain what these sets mean in a context without recursion or fixed points, which makes it easy to transition to LL (1) grammars, which are fundamentally just deterministic regular languages plus recursion.
So then the calculation of these sets as a fixed point equation is very easy, and using the deterministic regular languages means that the explanation of what these sets mean can be decoupled from how to compute via a fixed point computation.
Naturally, this means that the grammar of Introl must be LL (1).
Typechecking
Typechecking for this kind of language is pretty routine, but in this case it should be phrased as an abstract interpretation, in the style of Cousot’s Types as Abstract Interpretation .
The interesting thing here is that polymorphism can be presented via what Cousot called “the Herbrand abstraction”. The idea is that the abstract elements are monotypes with unification variables in them, with the partial order that
t₂ ≥ t₁ (if there is a substitution σ such that t₂=σ (t₁) , and the unification algorithm itself as an attempt to calculate the substitution witnessing the join of two terms. I say attempt, since unification can fail. So this is a kind of partial join operation  in the case you cannot join the two terms, there must have been a type error!
In Introl as presented, toplevel functions have a type annotation, and so it will work out that you end up not needing to do a serious fixed point computation to infer types. Indeed, even if you omitted annotations, the fact that unification is calculating a most general unifier means that the fixed point iteration for recursive functions terminates in one iteration. (This should not be too surprising since the DamaMilner algorithm only needs one traversal of the syntax.)
This fact is worth working out, because a great deal of life in static analysis involves trying to find excuses to iterate less. Indeed, this is what motivates the move to SSA  which will be the very next phase of the compiler.
In Introl as presented, toplevel functions have a type annotation, and so it will work out that you end up not needing to do a serious fixed point computation to infer types. Indeed, even if you omitted annotations, the fact that unification is calculating a most general unifier means that the fixed point iteration for recursive functions terminates in one iteration. (This should not be too surprising since the DamaMilner algorithm only needs one traversal of the syntax.)
This fact is worth working out, because a great deal of life in static analysis involves trying to find excuses to iterate less. Indeed, this is what motivates the move to SSA  which will be the very next phase of the compiler.
The choice of IR is always a fraught one in a compiler class. In this course, I would use a static single assignment (SSA) representation. SSA is a good choice because (a) it simplifies implementing all kinds of dataflow optimizations, (b) generating it also needs dataflow analyzes, and (c) it is the IR of choice for serious compilers. (a) and (b) mean we get to do lots of fixed point computations, and (c) means it won’t feel artificial to today’s yoof.
However, I wouldn’t go directly to SSA, since I find φfunctions very difficult to explain directly. IMO, it is worth pulling a trick out of Richard Kelsey’s hat , and exploiting the correspondence between continuationpassing style (CPS), and static single assignment form (SSA).
CPS often comes with all sorts of mysticism and higherorder functions, but in this case, it works out very simply. Basically, we can letnormalize the program, and then transforms the surface language into basic blocks with arguments (or if you prefer, a bunch of functions tail calling each other). This is the version of SSA that the MLton compiler used, and which (if memory serves) the Swift SIL interemediate representation uses as well.
Roughly speaking, each basic block in the program will end up have zero or more formal arguments, and jumps to that block have arguments which fill in the arguments. This ends up making it super easy to explain, because the program just goes into letnormal form, and all tail calls get compiled to jumpswitharguments, and nontail calls use a call IR instruction.
Now, if we do this maximally naively – which we absolutely want to do – we will get nearly pessimal SSA, since each basic block will take as arguments all the variables in its scope. The reason we do this is to make the need for SSA minimization obvious: we just want to shrink each block’s parameter lists so it doesn’t take parameters for variables which either don’t vary or don’t get used.
Doing this will take us to something close to “pruned SSA” form, which would normally be overkill for a first compiler, except that we want to use it to motivate the computation of the dominator tree. Because of the eitheror nature of the shrinking, we can do this with two analyzes, a liveness analysis and a constancy analysis.
Both of these are dataflow analyzes, and we can show how to use the dominator tree to order the work to calculate the calculate the fixed point faster. This will justify doing the fixed point computation to calculate the dominance information we need.
There is a bit of redundancy here, which is deliberate – I think it’s important to have done two analyzes before calculating dominators, because doing one fixed point computation to speed up one fixed point computation may feel artificial. But doing one computation that speeds up two computations is easy to see the benefit of, especially if we phrase the fixed point computation as taking the transfer function plus the dominance information.
I would avoid using one of the fast dominance algorithms, because pedagogically it’s easier to explain the calculation when it is close to the definition.
The other nice thing here is that typechecking got accelerated by a good choice of abstract domain, and flow analysis gets accelerated by a good choice of iteration order.
Once the students have some experience manipulating this representation, I would probably switch to the traditional SSA form. The main benefit of this is just teaching the ability to read the existing literature more.
One thing worth spelling out is that this language (like MLton’s SSA) has a highlevel SSA representation – switches, data constructors, and field selectors and things like that will all be part of the SSA IR. This makes it possible to do optimizations while thinking at the language level, rather than the machine level. And furthermore, since students have a decent SSA representation, we certainly should use it to do all the “easy” SSA optimizations, like copy propagation and loopinvariant code motion.
All of these are unconditional rewrites when in SSA form, so they are all easy to implement, and will get the students comfortable with manipulating the SSA representation. The next step is to implement dead code elimination, which is both a nice optimization, and also requires them to rerun liveness analysis. This will open the door to understanding that compilation passes may have to be done repeatedly.
Once the students have done that, they will be warmed up for the big “hero optimization” of the course, which will be sparse conditional constant propagation . Because we are working with a highlevel IR, sparse conditional constant propagation ought to yield even more impressive results than usual, with lots of tuple creation / pattern matching and cases on constructors disappearing in a really satisfying way.
In particular, good examples ought to arise from erasing the
None branches from code using the option monad for safe division safediv: int → int → option int , if there is a test that the divisors are nonzero dominating the divisions. Lowering and Register Allocation
As I mentioned before, the big optimization was performed on a highlevel SSA form: switch statements and data constructors and selectors are still present. Before we can generate machine code, they will have to be turned into lowerlevel operations. So we can define a “lowlevel” SSA, where the selections and so on are turned into memory operations, and then translate the highlevel IR into the lowlevel IR.
This ought to be done pretty naively, with each highlevel operation turning into its lowlevel instruction sequence in a pretty direct way. Since the course design was to do as many dataflow optimizations as we could, to bring out the generic structure, a good test the development is smooth enough is whether the flow analyzes are sufficiently parameterized that we can change the IR and lattices and still get decent code reuse. Then we can lean on these analyzes to clean up the lowlevel instruction sequences.
If more abstract code is too hard to understand, I would skip most of the lowlevel optimizations. The only mustdo analysis at the lowlevel representation is to rereun liveness analysis, so that we can do register allocation using the SSAform register allocation algorithms of Hack . This (a) lets us avoid graph coloring, (b) while still getting good results, and (c) depends very clearly on the results of a dataflow analysis. It also makes register coalescing (normally an advanced topic) surprisingly easy.
Code generation is then pretty easy – once we’ve done register allocation, we can map the lowlevel IR operations to loads, stores, and jumps in the dirtobvious way; calls and returns doing the obvious stack manipulations; and foreign calls to the Boehm gc to allocate objects. This is not optimal, but most of the techniques I know of in this space don’t use fixed points much, and so are offpiste relative to the course aim.
The surprising (to me, anyway) thing is that the choice of focusing on fixed point computations plus the choice to go SSAonly, lays out a path to a surprisingly credible compiler. Before writing this note I would have thought this was definitely too ambitious a compiler for a first compiler! But now, I think it only may be too ambitious.
Obviously, we benefit a lot from working with a firstorder functional langage: in many ways, is just an alternate notation for SSA. However, it looks different from SSA, and it looks like a highlevel language, which means the students ought to feel like they have accomplished something.
But before I would be willing to try teaching it this way, I would want to program up a compiler or two in this style. This would either confirm that the compiler is tractable, or would show that the compiler is just too big to be teachable. If it is too big, I’d probably change the surface language to only support booleans and integers as types. Since these values all fit inside a word, we could drop the lowering pass altogether.
Another thing I like is that we are able to do a plain old dataflow analysis to parse, and then a dataflow analysis with a clever representation of facts when typechecking, and then dataflow analyzes with a clever iteration scheme when doing optimization / codegen.
However, if this ends up being too cute and confusionprone, another alternative would be to drop type inference by moving to a simply typed language, and then drop SSA in favor of the traditional Kildall / Allen dataflow approach. You could still teach the use of acceleration structures by computing defuse chains and using them for (e.g.) liveness. This would bring out the parallels between parsing and flow analysis.
Relative to an imaginary advanced compilers course, we’re missing highlevel optimizations like inlining, partial redundancy elimination, and fancy loop optimizations, as well as lowlevel things like instruction scheduling. We’re also missing an angle for tackling controlflow analysis for higherorder language featuring objects or closures.
But since we were able to go SSA straight away, the students will be in a good position to learn this on their own.
Type structure
The type language has
Program structure
Top level definitions
All function declarations are top level. Nested functions and anonymous lambdaexpressions are not allowed. Function declarations have a type scheme, which may be polymorphic:
val foo: forall a_1, …, a_n. (T_1, …, T_n) > T def foo (x_1, …, x_n)=e Expressions
Expressions are basically the usual things:
 variables (x
 constants
3
 arithmetic
e_1 e_2
 unit
()
 tuples (e_1, …, e_n)
 data constructors
Foo (e)
 variable bindings
let p=e_1; e_2
 direct function calls
f (e_1, …, e_n) , where f
is a toplevel function name
 match statements match e {p_1 > e_1  … p_n > e_n}
 variables (x
 constructor patterns
Foo (x_1, …, x_n)
 tuple patterns (x_1, …, x_n)
Note the absence of control structures like (while or
for ; we will use tailrecursion for that. Note also that there are no explicit type arguments in calls to polymorphic functions. Patterns
Patterns exist, but nested patterns do not, in order to avoid having to deal with pattern compilation. They are:
(Example)
Here’s a function that reverses a list:
val reverse_acc: forall a. (list a, list a) > list a def reverse_acc (xs, acc)= match xs {  Nil > acc  Cons (x, xs) > reverse_acc (xs, Cons (x, acc)) } val reverse: forall a. list a > list a def reverse (xs)=reverse_acc (xs, Nil)
(Read More)Here's a function to zip two lists together:
val zip: forall a, b. (list a, list b) > list (a b) def zip (as, bs)= match as {  Nil > Nil  Cons (a, as') > match bs {  Nil > Nil  Cons (b, bs') > Cons ((a, b), zip (as', bs')) } }
GIPHY App Key not set. Please check settings