Lecture 5: Binary Operations and Basic Blocks

8.15

Lecture 5: Binary Operations and Basic Blocks🔗

Today we will extend the compiler to support binary arithmetic operations. This is a surprisingly significant change as it introduces the ambiguity of evaluation order into the language, and so we will introduce a new pass to the compiler that makes the evaluation order explicit in the structure of the term.

1 Growing the language: adding infix operators🔗

Again, we follow our standard recipe:

Its impact on the concrete syntax of the language
Examples using the new enhancements, so we build intuition of them
Its impact on the abstract syntax and semantics of the language
Any new or changed transformations needed to process the new forms
Executable tests to confirm the enhancement works as intended

1.1 Concrete and Abstract Syntax, Examples🔗

We add three new forms to our grammar: three binary arithmetic operations as well as parentheses so that we can disambiguate arithmetic notation.

‹prog›: def main ( IDENTIFIER ) : ‹expr› ‹expr›: | NUMBER | ADD1 ( ‹expr› ) | SUB1 ( ‹expr› ) | IDENTIFIER | LET IDENTIFIER EQ ‹expr› IN ‹expr› | ‹expr› + ‹expr› | ‹expr› - ‹expr› | ‹expr› * ‹expr› | ( ‹expr› )

Here the abstract syntax breaks slightly from the concrete syntax in that we don’t have an abstract syntax form for parentheses, since they only serve a syntactic and not a semantic purpose. We add these new operations as primitives, adjusting the primitive constructor to take in a vector of arguments, so that it encapsulates both unary and binary primitives.

enum Prim {
    Add1,
    Sub1,
    Add,
    Sub,
    Mul,
}

enum Expression {
  ...
  Prim(Prim, Vec<Expression>),
}

These new expression forms should be familiar from standard arithmetic notation. The parser will take care of operator precedence. I.e., the expressions

(2 - 3) + 4 * 5

and

(2 - 3) + (4 * 5)

both are parsed into the same abstract syntax tree

Prim(Add,
  [Prim(Sub, [Number(2), Number(3)]),
   Prim(Mul, [Number(4), Number(5)])])

1.2 Semantics🔗

At first it seems utterly straightforward to extend our interpreter to account for these new forms:

fn interpret(p: &ast::Program, x: i64) -> i64 {
    fn interp_exp(e: &ast::Expression, mut env: HashMap<String, i64>) -> i64 {
        match e {
            ast::Expression::Prim { prim, args } => match prim {
                ast::Prim::Add => {
                    let res1 = interp_exp(&args[0], env.clone());
                    let res2 = interp_exp(&args[1], env);
                    res1 + res2
                }
                ...
            },
        }
    }
    let env: HashMap<String, i64> = HashMap::unit(p.parameter.clone(), x);
    interp_exp(&p.body, env)
}

But notice that there is a somewhat arbitrary choice here. Should the clause for interpreting add be

ast::Prim::Add => {
  let res1 = interp_exp(&args[0], env.clone());
  let res2 = interp_exp(&args[1], env);
  res1 + res2
}

ast::Prim::Add => {
  let res2 = interp_exp(&args[1], env.clone());
  let res1 = interp_exp(&args[0], env);
  res1 + res2
}

Do we evaluate the expression from left-to-right or right-to-left? It turns out that this decision doesn’t affect the interpreter for our current language, but it will matter with future extensions.

Do Now!
What programming language feature could you add to our language that would make the difference between left-to-right and right-to-left evaluation matter?

There are many different possible answers:

Mutable variables
Writing to stdout or files
Reading from stdin

For instance, consider if we added a primitive print(e) that would print out the value of e and produce the same value. So print(5) would print 5 to stdout. Then how should the following program evaluate?

print(6) * print(7)

Obviously, the value it should produce is 42, but what should it print?

Prints "67", this is left-to-right evaluation order
Prints "76", this is right-to-left evaluation order
Print either "67" or "76", meaning the evaluation order is unspecified, or implementation dependent

Which do you prefer? Either of the first two seem very reasonable, with left-to-right seeming more reasonable to match the way we write English. The third option is something probably only a compiler writer would choose, because it means it is easier to optimize the program because you can arbitrarily re-order things!

We’ll go with the first choice: left-to-right evaluation order.

Note that doing things left-to-right like this is not quite the same as the PEMDAS rules. For instance the following arithmetic expression evaluates:

    (2 - 3) + 4 * 5
==> -1 + (4 * 5)
==> -1 + 20
==> 19

rather than the possible alternative of doing the multiplication first. The alternative of following PEMDAS to do the evaluation order would be very confusing:

(print(2) - 3) + print(4) * 5

If we follow our left-to-right evaluation then this would print "24" but if we follow PEMDAS literally we would probably print "42".

1.3 Enhancing the transformations: a new intermediate representation (IR)🔗

Exercise
What goes wrong with our current naive transformations? How can we fix them?

We don’t need much in the way of new x86 features to compile our language. We’re already familiar with add and sub, and so we only need to know that the signed multiplication operation is called imul.

Let’s try manually “compiling” some simple binary-operator expressions to assembly:

Original expression

Compiled assembly

(2 + 3) + 4

mov rax, 2
add rax, 3
add rax, 4

(4 - 3) - 2

mov rax, 4
sub rax, 3
sub rax, 2

((4 - 3) - 2) * 5

mov rax, 4
sub rax, 3
sub rax, 2
imul rax, 5

(2 - 3) + (4 * 5)

mov rax, 2
sub rax, 3
?????

Do Now!
Convince yourself that using a let-bound variable in place of any of these constants will work just as well.

So far, our compiler has only ever had to deal with a single active expression at a time: it moves the result into rax, increments or decrements it, and then potentially moves it somewhere onto the stack, for retrieval and later use. But with our new compound expression forms, that won’t suffice: the execution of (2 - 3) + (4 * 5) above clearly must stash the result of (2 - 3) somewhere, to make room in rax for the subsequent multiplication. We might try to use another register (rcx, maybe?), but clearly this approach won’t scale up, since there are only a handful of registers available. What to do?

1.3.1 Immediate expressions🔗

Do Now!
Why did the first few expressions compile successfully?

Notice that for the first few expressions, all the arguments to the operators were immediately ready:

They required no further computation to be ready.
They were either constants, or variables that could be read off the stack.

Perhaps we can salvage the final program by transforming it somehow, such that all its operations are on immediate values, too.

Do Now!
Try to do this: Find a program that computes the same answer, in the same order of operations, but where every operator is applied only to immediate values.

Note that conceptually, our last program is equivalent to the following:

let first = 2 - 3 in
let second = 4 * 5 in
first + second

This program has decomposed the compound addition expression into the sum of two let-bound variables, each of which is a single operation on immediate values. We can easily compile each individual operation, and we already know how to save results to the stack and restore them for later use, which means we can compile this transformed program to assembly successfully.

Come to think of it, compiling operations when they are applied to immediate values is so easy, wouldn’t it be nice if we did the same thing for unary primitives and if? This way every intermediate result gets a name, which will then be assigned a place on the stack (or better yet, a register) instead of every intermediate result necessarily going through rax.

2 Basic Blocks🔗

We introduce a new compiler pass: translating the code into an intermediate representation (IR). The intermediate representation we will use is called Static Single Assignment (SSA), and is the industry standard, used for example by the LLVM compiler framework.

We’ll only use a fragment of the full SSA IR for now: we will compile our source programs to a single basic block. The version of basic blocks we use now is a sequence of simple "operations" applied to immediate values, ending in a return statement1We will later add other "terminating" statements that can end a basic block, but we only need return for the straightline code we are producing now.. This has the benefit of being quite straightforward to compile to assembly code: if we have a mapping from variables to memory locations, then each operation can be directly compiled to a short sequence of instructions. This is one of the benefits of our IR: since the IR is "closer" to assembly, we can more easily understand what code will be generated for it, and in particular how to make that code efficient.

Interestingly, even though SSA IR is used for compilation of imperative code, like our source language, the variable bindings in SSA IR are immutable, meaning that a variable cannot be updated once it is defined. This is the origin of the name "Static Single Assignment", every variable is only assigned to at one static program position2when we extend to full SSA IR we will see that a variable can take on multiple values dynamically..

pub struct Program {
    pub param: VarName,
    pub entry: BlockBody,
}

pub enum BlockBody {
    Return(Immediate),
    Operation { dest: VarName, op: Operation, next: Box<BlockBody> },
}

pub enum Operation {
    Immediate(Immediate),
    Prim(Prim, Immediate, Immediate),
}

pub enum Prim {
    Add,
    Sub,
    Mul,
}

pub enum Immediate {
    Const(i64),
    Var(VarName),
}

An SSA program consists of a single basic block body, with a parameter name for the argument to the main function. A BlockBody is a sequence of Operations that assign the output of an Operation to a variable, ending with a Return of a specified immediate value. An operation is either one of the primitive arithmetic operations, or an immediate value.

Our SSA IR is a different kind of "programming language" than our source, in that we don’t really ever use a concrete syntax for it, instead only working with the abstract syntax trees. Programmers don’t write SSA programs themselves, the compiler generates them and analyzes them. But for convenience of discussion, we will sometimes use a textual format rendering a Program with 3 operations ending in a return as:

entry(x):
  y = add 2 x
  z = sub 18 3
  w = mul y z
  ret w

2.1 Translating Basic Blocks to Assembly🔗

We can think of our Basic Blocks as a simplified version of our source language and so we can adapt our method of compiling let bindings to compile basic blocks. We map each SSA variable to a memory offset from rsp and then we compile each operation to a sequence of instructions that places the result of the operation in the location of the given variable. For instance if we store y, z, w to the offsets [rsp - 16], [rsp - 24] and [rsp - 32] then we can compile the multiply operation w = mul y z to

mov rax, [rsp - 16]
mov r10, [rsp - 24]
imul rax, r10
mov [rsp - 32], rax

Here we use rax and r10 as "scratch registers" since x86 cannot operate on multiple addressed memory locations in the same instruction.

For uniformity, we can compile the entry point to move the input parameter from rdi into an offset from rsp. And lastly, we compile a ret in SSA to move the value into rax and then execute the ret instruction. This strategy would compile our example SSA program

entry(x):
  y = add 2 x
  z = sub 18 3
  w = mul y z
  ret w

to the following assembly

;; entry(x):
mov [rsp - 8], rdi
;; y = add 2 x
mov rax, 2
mov r10, [rsp - 8]
add rax, r10
mov [rsp - 16], rax
;; z = sub 18 3
mov rax, 18
mov r10, 3
sub rax, r10
mov [rsp - 24], rax
;; w = mul y z
mov rax, [rsp - 16]
mov r10, [rsp - 24]
imul rax, r10
mov [rsp - 32], rax
;; ret w
mov rax, [rsp - 32]
ret

2.2 Translating to a Basic Block🔗

Now how do we go about compiling our source language to an SSA Basic Block? An SSA Basic Block is very similar to our source programming language except that

We’ve removed the redundant add1 and sub1 primitives, translating them to use add and sub with a constant
Programs end with an explicit ret of an immediate, whereas Adder was an expression language
Primitive operations can only be applied to immediates, whereas in Adder they can be arbitrarily complex sub-expressions
A variable bindings stores the result of a simple Operation, whereas in Adder, the let binding stores the result of an arbitrarily complicated sub-expression.

Our goal then is to implement a function

fn lower(prog: ast::Program) -> ssa::Program

that "lowers" the AST into our intermediate representation. A AST program is just a thin wrapper around an AST expression, so we will most likely need to implement a recursive function on expressions:

fn lower_exp(exp: ast::Expression) -> ssa::BlockBody

That is, we compile an expression to a basic block body that returns the result of the expression. Such a function is easily implemented for immediates, but we run into issues with the other cases:

fn lower_exp(e: &ast::Expression) -> ssa::BlockBody {
    match e {
        ast::Expression::Variable(x) => ssa::Return(ssa::Immediate(Immediate::Var(x.clone()))),
        ast::Expression::Number(n) => ssa::Return(ssa::Immediate(Immediate::Const(*n))),
        ast::Prim { prim, args } => match prim {
            Add => {
                let arg1 = args[0];
                let arg2 = args[1];
                ??
            }
            ...
        },
        ...
    }
}

The problem we have here is that outputting a basic block that returns the result of an expression is not compositional: there’s no way to combine the basic blocks lower_exp(arg1) and lower_exp(arg2) to get one that returns their sum.

What do we want to do in this case? We want to perform some sequence of operations and store the output of arg1 in a variable, then do the same for arg2 and then add up the results. We can make this compositional by using a trick called continuation-passing style: instead of producing a BlockBody that returns the value of the input expression directly, we take in as an argument

A "destination" variable where we should store the result of the expression
A "next" BlockBody of code that should be run after we have assigned the result to the destination variable

We call this combination of a destination variable and a next BlockBody a continuation, as it tells us how the program should continue after the expression we are compiling.

How does continuation-passing style solve our issue? Well consider the lower_exp function, but now taking a continuation as an argument:

fn lower_exp(e: &ast::Expression, k: Continuation) -> ssa::BlockBody {
    match e {
        ast::Expression::Variable(x) => {
            let (dest, body) = k;
            ssa::BlockBody::Operation {
                dest: dest,
                op: ssa::Operation::Immediate(ssa::Immediate::Var(x.to_string())),
                next: Box::new(body),
            }
        }
        ast::Expression::Number(n) => {
            let (dest, body) = k;
            ssa::BlockBody::Operation {
                dest: dest,
                op: ssa::Operation::Immediate(ssa::Immediate::Const(*n)),
                next: Box::new(body),
            }
        }
        ast::Expression::Prim { prim, args } => match prim {
            ast::Prim::Add => {
                let arg1 = &args[0];
                let arg2 = &args[1];
                // TODO: generate *unique* variable names for these!
                let tmp1 = format!("addArg1");
                let tmp2 = format!("addArg2");
                let (dest, body) = k;
                lower_exp(
                    arg1,
                    (
                        tmp1.clone(),
                        lower_exp(
                            arg2,
                            (
                                tmp2.clone(),
                                ssa::BlockBody::Operation {
                                    op: ssa::Operation::Prim(
                                        ssa::Prim::Add,
                                        ssa::Immediate::Var(tmp1),
                                        ssa::Immediate::Var(tmp2),
                                    ),
                                    dest,
                                    next: Box::new(body),
                                },
                            ),
                        ),
                    ),
                )
            }
            _ => todo!(),
        },
        _ => todo!(),
    }
}

When we compile a variable or a number, we simply place that immediate in the destination variable and then execute the next code of the continuation.

When compiling a complex expression like arg1 + arg2, we want to do the following sequence of things:

Compile arg1, storing its result in a temporary variable tmp1
Then compile arg2, storing its result in a temporary variable tmp2
Then add up tmp1 and tmp2, storing them in the dest of the provided continuation
Finally, we execute the provided body of the continuation

We see that the code above implements this by building up a large continuation to be passed to arg1. In a sense, this continuation-passing style translation runs "backwards": we first

Finally, we return to lower_prog, which should provide a continuation for the entry point expression. In this case we provide a continuation that immediately returns its input:

fn lower(p: &ast::Program) -> ssa::Program {
    // TODO: make sure this variable name is unique!
    let dest = format!("result");
    let body = ssa::BlockBody::Return(ssa::Immediate::Var(dest.clone()));
    ssa::Program {
        param: p.parameter.to_string(),
        entry: lower_exp(&p.body, (dest, body)),
    }
}

As written, there is a flaw in the translation. When we compile an Add expression, we use the same temporary variable names "addArg1" and "addArg2". So this means the result of this translation on the input expression

def main(x):
  (x + 3) + (4 + x)

is the incorrect basic block

entry(x):
  addArg1 = x
  addArg2 = 3
  addArg1 = add addArg1 addArg2
  addArg1 = 4
  addArg2 = x
  addArg2 = add addArg1 addArg2
  result = add addArg1 addArg2
  ret result

What we need to do is ensure that every time we compile an add, we generate unique variable names by appending a unique identifier to the end of the name. This can be accomplished by assigning every generated variable a unique variable name so that we know there are no name clashes. In this case we would end up with something like

entry(x%0):
  addArg1%1 = x%0
  addArg2%2 = 3
  addArg1%3 = add addArg1%1 addArg2%2
  addArg1%4 = 4
  addArg2%5 = x
  addArg2%6 = add addArg1%4 addArg2%5
  result%7 = add addArg1%3 addArg2%6
  ret result%7

Exercise
Modify the continuation-based translation to generate unique variable names

Exercise
Extend this continuation-based translation to the remaining types of expressions

1We will later add other "terminating" statements that can end a basic block, but we only need return for the straightline code we are producing now.

2when we extend to full SSA IR we will see that a variable can take on multiple values dynamically.