Lecture 4: Conditionals

8.10

Lecture 4: Conditionals

Our previous compiler could increment and decrement numbers, as well as handle let-bound identifiers. This is completely straight-line code; there are no decisions to make that would affect code execution. We need to support conditionals to incorporate such choices. Also, we’d like to be able to support compound expressions like binary, infix operators (or eventually, function calls), and to do that we’ll need some more careful management of data.

Let’s start with conditionals, and move on to compound expressions second.

1 Growing the language: adding conditionals

Reminder: Every time we enhance our source language, we need to consider several things:

Its impact on the concrete syntax of the language
Examples using the new enhancements, so we build intuition of them
Its impact on the abstract syntax and semantics of the language
Any new or changed transformations needed to process the new forms
Executable tests to confirm the enhancement works as intended

1.1 The new concrete syntax

‹expr›: ... | if ‹expr› : ‹expr› else: ‹expr›

1.2 Examples and semantics

Currently our language includes only integers as its values. We’ll therefore define conditionals to match C’s behavior: if the condition evaluates to a nonzero value, the then-branch will execute, and if the condition evaluates to zero, the else-branch will execute. It is never the case that both branches should execute.

Concrete Syntax		Answer
`if 5: 6 else: 7`		`6`
`if 0: 6 else: 7`		`7`
`if sub1(1): 6 else: 7`		`7`

Unlike C, though, if-expressions are indeed expressions: they evaluate to a value, which means they can be composed freely with the other expression forms in our language.

Do Now!
Construct larger examples, combining if-expressions with each other or with let-bindings, and show their evaluation.

1.3 The new abstract syntax

enum Exp {
  ...
  If { cond: Box<Exp>, thn: Box<Exp>, els: Box<Exp> }
}

Do Now!
Extend your interpreter from the prior lecture to include conditionals. As with last lecture, suppose we added a print expression to the language — what care must be taken to get the correct semantics?

There’s something a bit unsatisfying about interpreting if in our language by using if in Rust: it feels like a coincidence that our semantics and Rusts’s semantics agree, and it doesn’t convey much understanding of how conditionals like if actually work...

1.4 Enhancing the transformations: Jumping around

1.4.1 Comparisons and jumps

To compile conditionals, we need to add new assembly instructions that allow us to change the default control flow of our program: rather than proceeding sequentially from one instruction to the next, we need jumps to immediately go to an instruction of our choosing. The simplest such form is just jmp SOME_LABEL, which unconditionally jumps to the named label in our program. We’ve seen only one label so far, namely our_code_starts_here, but we can freely add more labels to our program to indicate targets of jumps. More interesting are conditional jumps, which only jump based on some test; otherwise, they simply fall through to the next instruction.

To trigger a conditional jump, we need to have some sort of comparison. The instruction cmp arg1 arg2 compares its two arguments, and sets various flags whose values are used by the conditional jump instructions:

Instruction		Jump if ...
`je` LABEL		... the two compared values are equal
`jne` LABEL		... the two compared values are not equal
`jl` LABEL		... the first value is less than the second
`jle` LABEL		... the first value is less than or equal to the second
`jg` LABEL		... the first value is greater than the second
`jge` LABEL		... the first value is greater than or equal to the second
`jb` LABEL		... the first value is less than the second, when treated as unsigned
`jbe` LABEL		... the first value is less than or equal to the second, when treated as unsigned

Some conditional jumps are triggered by arithmetic operations, instead:

Instruction		Jump if ...
`jz` LABEL		... the last arithmetic result is zero
`jnz` LABEL		... the last arithmetic result is non-zero
`jo` LABEL		... the last arithmetic result overflowed
`jno` LABEL		... the last arithmetic result did not overflow

Do Now!
Consider the examples of if-expressions above. Translate them manually to assembly.

Let’s examine the last example above: ~hl:2:s~if ~hl:1:s~sub1(1)~hl:1:e~: ~hl:3:s~6~hl:3:e~ else: ~hl:4:s~7~hl:4:e~~hl:2:e~. Which of the following could be valid translations of this expression?

  ~hl:1:s~mov RAX, 1
  sub1 RAX~hl:1:e~
  ~hl:2:s~cmp RAX, 0
  je if_false
if_true:
  ~hl:3:s~mov RAX, 6~hl:3:e~
  jmp done
if_false:
  ~hl:4:s~mov RAX, 7~hl:4:e~
done:~hl:2:e~

  ~hl:1:s~mov RAX, 1
  sub1 RAX~hl:1:e~
  ~hl:2:s~cmp RAX, 0
  je if_false
if_true:
  ~hl:3:s~mov RAX, 6~hl:3:e~

if_false:
  ~hl:4:s~mov RAX, 7~hl:4:e~
done:~hl:2:e~

  ~hl:1:s~mov RAX, 1
  sub1 RAX~hl:1:e~
  ~hl:2:s~cmp RAX, 0
  jne if_true
if_true:
  ~hl:3:s~mov RAX, 6~hl:3:e~
  jmp done
if_false:
  ~hl:4:s~mov RAX, 7~hl:4:e~
done:~hl:2:e~

  ~hl:1:s~mov RAX, 1
  sub1 RAX~hl:1:e~
  ~hl:2:s~cmp RAX, 0
  jne if_true
if_false:
  ~hl:4:s~mov RAX, 7~hl:4:e~
  jmp done
if_true:
  ~hl:3:s~mov RAX, 6~hl:3:e~
done:~hl:2:e~

The first two follow the structure of the original expression most closely, but the second has a fatal flaw: once the then-branch finishes executing, control falls through into the else-branch when it shouldn’t. The third version flips the condition and the target of the jump, but tracing carefully through it reveals there is no way for control to reach the else-branch. Likewise, tracing carefully through the first and last versions reveal they could both be valid translations of the original expression.

Working through these examples should give a reasonable intuition for how to compile if-expressions more generally: we compile the condition, check whether it is zero and if so jump to the else branch and fall through to the then branch. Both branches are then compiled as normal. The then-branch, however, needs an unconditional jump to the instruction just after the end of the else-branch, so that execution dodges the unwanted branch.

Do Now!
Work through the initial examples, and the examples you created earlier. Does this strategy work for all of them?

Let’s try this strategy on a few examples. For clarity, we repeat the previous example below, so that the formatting is more apparent.

Original expression		Compiled assembly
`~hl:2:s~if ~hl:1:s~sub1(1)~hl:1:e~: ~hl:3:s~6~hl:3:e~ else: ~hl:4:s~7~hl:4:e~~hl:2:e~`		`~hl:1:s~mov RAX, 1 sub1 RAX~hl:1:e~ ~hl:2:s~cmp RAX, 0 je if_false if_true: ~hl:3:s~mov RAX, 6~hl:3:e~ jmp done if_false: ~hl:4:s~mov RAX, 7~hl:4:e~ done:~hl:2:e~`
`~hl:1:s~if ~hl:2:s~10~hl:2:e~: ~hl:3:s~2~hl:3:e~ else: ~hl:4:s~sub1(0)~hl:4:e~~hl:1:e~`		`~hl:2:s~mov RAX, 10~hl:2:e~ ~hl:1:s~cmp RAX, 0 je if_false if_true: ~hl:3:s~mov RAX, 2~hl:3:e~ jmp done if_false: ~hl:4:s~mov RAX, 0 sub1 RAX~hl:4:e~ done:~hl:1:e~`
`~hl:1:s~let x =~hl:1:e~ if 10: 2 else: 0 in ~hl:3:s~if ~hl:2:s~x~hl:2:e~: ~hl:4:s~55~hl:4:e~ else: ~hl:5:s~999~hl:5:e~~hl:3:e~`		`mov RAX, 10 cmp RAX, 0 je if_false if_true: mov RAX, 2 jmp done if_false: mov RAX, 0 done: ~hl:1:s~mov [RSP-8], RAX~hl:1:e~ ~hl:2:s~mov RAX, [RSP-8]~hl:2:e~ ~hl:3:s~cmp RAX, 0 je if_false if_true: ~hl:4:s~mov RAX, 55~hl:4:e~ jmp done if_false: ~hl:5:s~mov RAX, 999~hl:5:e~ done:~hl:3:e~`

The last example is broken: the various labels used in the two if-expressions are duplicated, which leads to illegal assembly:

$ nasm -f elf64 -o output/test1.o output/test1.s
output/test1.s:20: error: symbol `if_true' redefined
output/test1.s:23: error: symbol `if_false' redefined
output/test1.s:25: error: symbol `done' redefined

We need to generate unique labels for each expression.

1.4.2 Approach 1: Counter

One easy approach would be to thread a counter through our code generator, implemented as a &mut u32 and increment it each time we need a new label name. However, this could start to clutter our compiler pass a lot and we would need to keep track of correctly maintaining our counter state along with how we actually implement the code generation. Additionally using a counter like this makes testing more brittle since the names generated would be dependent on when exactly we are incrementing the counter and so if we made small changes our tests would break even though there is no semantic change.

1.4.3 Approach 2: Tagging

In the Adder assignment, the definition of Exp is slightly more complicated than that presented above: it is parameterized by an arbitrary type, allowing us to stash any data we wanted at the nodes of our AST:

enum Exp<Ann> {
    Num(i64, Ann),
    Prim1(Prim1, Box<Exp<Ann>>, Ann),
    Var(String, Ann),
    Let { bindings: Vec<(String, Exp<Ann>)>,
          body: Box<Exp<Ann>>,
          ann: Ann
        }
    If { cond: Box<Exp<Ann>>, thn: Box<Exp<Ann>>, els: Box<Exp<Ann>>, ann: Ann }
}

The adder compiler uses this flexibility to tag every expression with its source location information Exp<Span>, so that we can give precisely-located error messages. But this parameter is more flexible than that: we might consider walking the expression and giving every node a unique identifier:

type Tag = u64;

fn tag<Ann>(e: &Exp<Ann>) -> Exp<Tag> {
   tag_help(e, &mut 0)
}
fn tag_help<Ann>(e: &Exp<Ann>, counter: &mut Tag) -> Exp<Tag> {
    let cur_tag = *counter;
    *counter += 1;
    match e {
        Exp::Prim1(op, e, _) => Exp::Prim1(*op, Box::new(tag_help(e, counter)), cur_tag),
        ...
    }
}

By doing this we separate the task of generating names from our other compilation tasks. It also makes other compiler passes easier to test as their dependence on generated names is now determined by the annotations on the input.

1.4.4 Putting it together: compiling if-expressions

If we use our decorated Exp<Tag> definition and our tag function above, then compiling if-expressions becomes:

fn compile_with_env<'exp>(e: &'exp Expr<Tag>, mut env: Vec<(&'exp str, i32)>) -> Result<Vec<Instr>, CompileErr> {
    match e {
        Exp::If { cond, thn, els, ann } => {
            let else_lab = format!("if_false#{}", ann);
            let done_lab = format!("done#{}", ann);

            let mut is = compile_with_env(cond, env.clone())?;
            is.push(Instr::Cmp(BinArgs::ToReg(Reg::Rax, Arg32::Imm(0))));
            is.push(Instr::Je(else_lab.clone()));
            is.extend(compile_with_env(thn, env.clone())?);
            is.push(Instr::Jmp(done_lab.clone()));
            is.push(Instr::Label(else_lab.clone()));
            is.extend(compile_with_env(els, env)?);
            is.push(Instr::Label(done_lab));
            Ok(is)
        }
    ...
    }
}
pub fn compile_to_string(e: &Exp<Span>) -> Result<String, CompileErr> {
    let tagged = tag(e);
    let is = compile_with_env(&tagged, Vec::new())?;
... // insert the section .text etc
}

1.5 Testing

As always, we must test our enhancements. Properly testing if-expressions is slightly tricky right now: we need to confirm that

We always generate valid assembly
If-expressions compose properly with each other, and with other expressions in the language.
The generated assembly only ever executes one of the two branches of an if-expression

Testing the first property amounts to testing the tag function, to confirm that it never generates duplicate ids in a given expression. Testing the next one can be done by writing a suite of programs in this language and confirming that they produce the correct answers. Testing the last requirement is hardest: we don’t yet have a way to signal errors in our programs (for example, the compiled equivalent of panic("This branch shouldn't run!")) For now, the best we can do is manually inspect the generated output and confirm that it is correct-by-construction, but this won’t suffice forever.

Exercise
Add a new Prim1 operator to the language, that you can recognize and deliberately compile into invalid assembly that crashes the compiled program. Use this side-effect to confirm that the compilation of if-expressions only ever executes one branch of the expression. Hint: using the sys_exit(int) syscall is probably helpful.