Assignment 3: Boa: Adding new operators

8.2

Assignment 3: Boa: Adding new operators

Due: Thursday 09/30 at 9:00pm

git clone

In this compiler, you’ll enhance your existing compiler with Binary Operators and Arithmetic. Click here for scary snake picture!

1 The Boa Language

1.1 Concrete Syntax

The concrete syntax of Boa is:1The parser is given in the parser.lalrpop and looks a bit more complicated to deal with binary operation precedence and the interaction of if/let with binary operations. If you are interested you can try reading it and see that it is somewhat similar to the grammars we have been using outside of some obscure sigils used to track source location information.

‹expr›: | let ‹bindings› in ‹expr› | if ‹expr› : ‹expr› else: ‹expr› | ‹binop-expr› ‹binop-expr›: | NUMBER | IDENTIFIER | add1 ( ‹expr› ) | sub1 ( ‹expr› ) | ‹expr› + ‹expr› | ‹expr› - ‹expr› | ‹expr› * ‹expr› | ( ‹expr› ) ‹bindings›: | IDENTIFIER = ‹expr› | IDENTIFIER = ‹expr› , ‹bindings›

As in Adder, a Let can have one or more bindings.

1.2 Abstract Syntax

#[derive(Clone, Debug, PartialEq, Eq)]
pub enum Exp<Ann> {
    Num(i64, Ann),
    Var(String, Ann),
    Prim1(Prim1, Box<Exp<Ann>>, Ann),
    Prim2(Prim2, Box<Exp<Ann>>, Box<Exp<Ann>>, Ann),
    Let { bindings: Vec<(String, Exp<Ann>)>, // new binding declarations
          body: Box<Exp<Ann>>,  // the expression in which the new variables are bound
          ann: Ann
    },
    If { cond: Box<Exp<Ann>>,
         thn: Box<Exp<Ann>>,
         els: Box<Exp<Ann>>,
         ann: Ann
    },
}

#[derive(Copy, Clone, Debug, PartialEq, Eq)]
pub enum Prim1 {
    Add1,
    Sub1,
}

#[derive(Copy, Clone, Debug, PartialEq, Eq)]
pub enum Prim2 {
    Add,
    Sub,
    Mul,
}

1.3 Semantics

In addition to the semantics of Adder, we now have infix binary operators (addition, subtraction and multiplication), that are evaluated leftmost-innermost first (i.e., the standard left-to-right order that obeys parentheses), and conditional expressions. An Exp::If expression evaluates its condition, then evaluates its then-branch if the condition is non-zero, and evaluates its else-branch if the condition was zero.

To compile these expressions, we need a few more assembly instructions:

#[derive(Clone, Debug, PartialEq, Eq)]
pub enum Instr {
    Mov(MovArgs),

    Add(BinArgs),
    Sub(BinArgs),
    IMul(BinArgs),
    Cmp(BinArgs),

    Label(String),

    Jmp(String),
    Je(String),
    Jne(String),
    Jl(String),
    Jle(String),
    Jg(String),
    Jge(String),
}

Additionally, I have added another "work register" r15 to the registers:

#[derive(Copy, Clone, Debug, PartialEq, Eq)]
pub enum Reg {
    Rax,
    Rsp,
    R15,
}

You will likely find this extra register useful for implementing binary operations on constants, since you need to put 64-bit constants into a register before using add/sub/imul.

The sub and imul instructions are analogous to add, that take two arguments, apply their respective operations, and place their results in RAX. Be sure to use imul (signed multiplication) rather than mul (unsigned). Labels let us name the first of a sequence of instructions, akin to how we label start_here: to begin our code2Technically, labels are not instructions, so Instr is a bit mis-named.. The cmp instruction compares its two arguments, and sets some bits of the RFLAGS to tell if the arguments were equal, less than, greater than, etc. Rather than directly manipulating this register, we test the value of these flags with the jump instructions: jne will jump control to the named label if the flags mean NOT-EQUAL, and je will jump control to the named label when the flags mean EQUAL, etc.. Finally, jmp will unconditionally jump to the named label.

2 Starter code for this assignment

You’ve been given a starter codebase that has several pieces of infrastructure, mostly the same as before:

The types for the AST and the sequential expressions, as well as some helper functions, are in syntax.rs. You don’t need to edit this file.
Code relevant to lexing and parsing are in lexer.rs, parser.rs and parser.lalrpop, which you do not need to edit. parser.rs was generated from the file parser.lalrpop using a parser generator. You shouldn’t have to call any of these functions yourself, but note that since we changed the parser implementation, I had to tweak our definition of Spans. The parser will return Span1 (one-dimensional spans) whereas the Span type we were using before is now called Span2 (two-dimensional spans). This shouldn’t affect your code except that when you raise an error you will use a Span1 which will then be converted to a Span2 when displaying error information to the user.
A main program (main.rs) with support code (runner.rs and interp.rs) that you can use to compile, run and run the reference interpreter. You don’t need to edit any of these.

All of your edits — which will be to write the compiler for Boa, and test it — will happen in compile.rs, asm.rs and tests/examples.rs.

3 Implementing a Compiler for Boa

Again, the primary task of writing the Boa compiler is simple to state: take an instance of the Exp datatype and turn it into a list of assembly instructions. But since we now have more complicated expressions, we need to worry about simplifying the program, first.

3.1 New assembly instructions

Extend your implementation of instr_to_string and associated helper functions in asm.rs to handle the new assembly instructions.

3.2 Checking for scoping problems

In Adder, we asked you to confirm that no Let expression bound two identifiers with the same name, and to confirm that no expression referred to an unbound identifier. You likely interspersed that code with the compiler itself. While this was fine for Adder, let’s refactor this code into something more maintainable.

Define the function check_scope(e: &Exp<Span1>) -> Result<(), CompileErr<Span1>> that abstracts out these two checks. This function returns either Ok(()), indicating that the program has no scoping issues, or CompileErr<Span1> indicating a scoping issue with associated source location information

3.3 Converting to Sequential Form

Sequential Form asserts that throughout a program, any operator expression contains arguments that are immediate: that is, are numbers or identifiers, and therefore don’t perform any computation of their own. Additionally, we can think of the decision in an If expression to be an operation, so we also need to ensure that the condition is immediate. The following function is a reliable predicate for checking this property:

Design (and test!) a function sequentialize(e: &Exp<u32>) -> SeqExp<()> that takes a tagged expression and produces a new expression that is in Sequential Form. You can assume that all of the u32 annotations in the input are unique. Because this program will include some new expressions, we’re willing to discard any decorations we had on the expression, which explains why the input could be decorated with any type of information, but the output will just be decorated with unit values.

When you need to generate fresh names (i.e., unique names that aren’t used anywhere in the expression), a useful strategy is to generate names of the form format!("#{}_{}", reason, tag), where reason is "prim1", "prim2_1", "prim2_2", "if", etc., and tag is the annotation on the expression. So if you need to generate a variable and your input expression is Exp::Prim1(op, e, 7), you can use the name "#prim1_7". This is only a suggestion, you may use whatever strategy you like in your compiler.

Additional examples of sequentialize are given below.

3.4 Compilation

Adapt your compile_to_instrs function from Adder to compile the sequential expression forms in Boa. This means refactoring the similar cases as well as adding new support for If and Prim2. You may want to restrict its signature to only accept tagged expressions that are in A-Normal Form. Remember that the invariant for compile_to_instrs is that it always leaves its answer in rax; this invariant will definitely help you!

The starter code includes an extended compile_to_string function to invoke your functions and appropriately tag the ASTs with unique identifiers for you to use at the right moments.

4 Recommendations

Here’s an order in which you could consider tackling the implementation:

Write some tests for the input and output of sequentialize for nested EPrim1 expressions to understand what the ANF transformation looks like on those examples.
Work through both the sequentialize implementation and the ELet case of compile_expr. Write tests as you go.
Finish the If case for compile_to_instrs (with immediate conditions) so you can run simple programs with if. You will also need to fill in the printing of the relevant assembly instructions, so that they can be output...
Write some tests for the input and output of performing sequentialize on if-expressions, again to get a feel for what the transformation looks like.
Work through both the sequentialize implementation and the Prim2 case of compile_to_instrs. Write tests as you go.

5 Testing the Compiler

As with Adder, we will have integration tests in tests/examples.rs with example files in the examples/ directory. You can (and should!) re-use your examples from the adder homework, though you will need to re-write them to use the new syntax3This will be the only time I switch the syntax on you.. Over the course of the semester you should accumulate a large amount of example programs that will help greatly when we start adding more complex features and optimizations.

Additionally, you should unit test your sequentialize function in the provided submodule of compile.rs.

6 Running main

Running your own programs is the same as with Adder, except you’ll give them the .boa file extension.

7 List of Deliverables

your compile.rs and asm.rs
the other src/*.rs files in the starter code
any additional modules you saw fit to write
the Cargo.toml
integration tests (tests/examples.rs)
any test input programs (examples/*.boa files)

8 Grading Standards

For this assignment, you will be graded on

Whether your code implements the specification (functional correctness),
the comprehensiveness of your test coverage

9 Cross-platform Issues

The instructions from the adder assignment for compiling the code should work as before. If you are using a new Mac with an Apple Silicon chip, you will need to make a couple of adjustments.

First, install x86-64 cross-compilation in rust by running rustup target install x86_64-apple-darwin
Then when you link your with stub.rs, tell the compiler to produce x86_64 machine code rustc -L . --target x86_64-apple-darwin -o stub.exe stub.rs

I have adjusted the main.rs script to automatically pass this flag so as long as you have installed the cross-compilation support the runner should now work for you. Let me know on Piazza if you have any issues.

10 Submission

Wait! Please read the assignment again and verify that you have not forgotten anything!

Please submit your homework on gradescope by the above deadline.

11 Additional sequential form examples

To address some recurring questions, here are some additional examples of sequentialization.

Given the boa program

let x = add1(2) in x

The most straightforward sequentialization algorithm will produce

let x = (let tmp = 2 in add1(tmp)) in x

where tmp can be whatever variable name you generate that is guaranteed to be different from all others.

However, notice that the original program was already in sequential form, so the temporary variable tmp is not truly necessary. So another valid sequentialization would be to return the program unchanged:

let x = add1(2) in x

Your sequentialization function can produce either one, just make sure your tests align with the strategy that you choose.

Here are some more examples.

let x1 = 1, x2 = 2 in
x1 + x2

can be sequentialized to

let x1 = 1 in
let x2 = 2 in
let tmp0 = x1 in
let tmp1 = x2 in
tmp0 + tmp1

or, without generating unnecessary temporaries:

let x1 = 1 in
let x2 = 2 in
x1 + x2

Next,

(2 * 9) + (18 - 3)

can be sequentialized to

let tmp0 = (let tmp1 = 2 in let tmp2 = 9 in tmp1 * tmp2) in
let tmp3 = (let tmp4 = 18 in let tmp5 = 3 in tmp4 - tmp5) in
tmp0 + tmp3

or, without generating unnecessary temporaries:

let tmp0 = 2 * 9 in
let tmp3 = 18 - 3 in
tmp0 + tmp3

Finally, an example with if:

3 + (if 5: add1(6) else: 7)

can be sequentialized to

let tmp0 = 3 in
let tmp1 = (let tmp2 = 5 in
            if tmp2:
              let tmp3 = 6 in add1(tmp3)
            else:
              7) in
tmp0 + tmp1

or without unnecessary temporaries:

let tmp1 = (if 5:
              add1(6)
            else:
              7) in
3 + tmp1

1The parser is given in the parser.lalrpop and looks a bit more complicated to deal with binary operation precedence and the interaction of if/let with binary operations. If you are interested you can try reading it and see that it is somewhat similar to the grammars we have been using outside of some obscure sigils used to track source location information.

2Technically, labels are not instructions, so Instr is a bit mis-named.

3This will be the only time I switch the syntax on you.

1	The Boa Language
2	Starter code for this assignment
3	Implementing a Compiler for Boa
4	Recommendations
5	Testing the Compiler
6	Running main
7	List of Deliverables
8	Grading Standards
9	Cross-platform Issues
10	Submission
11	Additional sequential form examples