Assignment 2: Boa: Adding new operators
Due: Friday 09/29 at 5:00pm
git clone
In this compiler, you’ll enhance your existing compiler with Binary Operators and Arithmetic. Click here for scary snake picture!
1 The Boa Language
1.1 Concrete Syntax
The concrete syntax of Boa is:
‹expr› let ‹bindings› in ‹expr› if ‹expr› : ‹expr› else: ‹expr› ‹binop-expr› ‹binop-expr› NUMBER IDENTIFIER add1 ( ‹expr› ) sub1 ( ‹expr› ) ‹expr› + ‹expr› ‹expr› - ‹expr› ‹expr› * ‹expr› ( ‹expr› ) ‹bindings› IDENTIFIER = ‹expr› IDENTIFIER = ‹expr› , ‹bindings›
As in Adder, a Let can have one or more bindings.
1.2 Abstract Syntax
#[derive(Clone, Debug, PartialEq, Eq)]
pub enum Exp<Ann> {
Num(i64, Ann),
Var(String, Ann),
Prim(Prim, Vec<Exp<Ann>>, Ann),
Let { bindings: Vec<(String, Exp<Ann>)>, // new binding declarations
body: Box<Exp<Ann>>, // the expression in which the new variables are bound
ann: Ann
},
If { cond: Box<Exp<Ann>>,
thn: Box<Exp<Ann>>,
els: Box<Exp<Ann>>,
ann: Ann
},
}
#[derive(Copy, Clone, Debug, PartialEq, Eq)]
pub enum Prim {
Add1,
Sub1,
Add,
Sub,
Mul,
}1.3 Semantics
In addition to the semantics of Adder, we now have infix binary operators
(addition, subtraction and multiplication), that are evaluated
leftmost-innermost first (i.e., the standard left-to-right order that obeys
parentheses), and conditional expressions. An Exp::If expression evaluates its
condition, then evaluates its then-branch if the condition is non-zero, and
evaluates its else-branch if the condition was zero.
To compile these expressions, we need a few more assembly instructions:
#[derive(Clone, Debug, PartialEq, Eq)]
pub enum Instr {
Mov(MovArgs),
Add(BinArgs),
Sub(BinArgs),
IMul(BinArgs),
Cmp(BinArgs),
Label(String),
Jmp(String),
Je(String),
Jne(String),
Jl(String),
Jle(String),
Jg(String),
Jge(String),
}Additionally, we have added access to all x64 general purpose
registers, as you may want to use a "scratch register" to implement
certain operations. But be warned: certain registers, r12-r15,
rbx, rsp and rbp are callee-save registers in
the System V AMD64 calling convention that Rust uses to call into your
code, so they should be restored to their original values if you
overwrite them. If you want to use a scratch register, we suggest you
use any of the other registers such as r8 which are caller-save,
meaning the callee is free to overwrite their values.
#[derive(Copy, Clone, Debug, PartialEq, Eq)]
pub enum Reg {
Rax,
Rbx,
Rdx,
Rcx,
Rsp,
Rbp,
Rsi,
Rdi,
R8,
R9,
R10,
R11,
R12,
R13,
R14,
R15,
}add/sub/imul.The sub and imul instructions are analogous to add,
that take two arguments, apply their respective operations, and place
their results in RAX. Be sure to use imul (signed
multiplication) rather than mul (unsigned). Labels let us name
the first of a sequence of instructions, akin to how we label
start_here: to begin our code1Technically, labels are not
instructions, so Instr is a bit mis-named.. The cmp
instruction compares its two arguments, and sets some bits of the
RFLAGS to tell if the arguments were equal, less than, greater
than, etc. Rather than directly manipulating this register, we test
the value of these flags with the jump instructions: jne will
jump control to the named label if the flags mean NOT-EQUAL,
and je will jump control to the named label when the flags mean
EQUAL, etc.. Finally, jmp will unconditionally jump to
the named label.
2 Starter code for this assignment
You’ve been given a starter codebase that has several pieces of infrastructure, mostly the same as before. I recommend you take a look at the following:
The extended types for the AST and the sequential expressions, as well as some helper functions, are in
syntax.rs.asm.rshas been extended to include the new assembly features.
All of your edits —compile.rs and
tests/examples.rs.
3 Implementing a Compiler for Boa
Again, the primary task of writing the Boa compiler is simple to state: take an
instance of the Exp datatype and turn it into a list of assembly
instructions. But since we now have more complicated expressions, we need to
worry about simplifying the program, first.
3.1 Checking for scoping problems
Extend the function
check_prog<Span>(e: &Exp<Span>) -> Result<(), CompileErr<Span>>to support the new cases of expressions.
3.2 Generating Unique Names
To correctly implement sequentialization and conditional expressions
you will need to generate fresh names in your compiler. You can do
this by threading through a mutable counter in the corresponding
compiler pass, or you can separate this out in a function with a
signature like tag_exp<T>(e: &Exp<T>) -> Exp<u32> which will
annotate every sub-expression with a unique identifier. You may have
to define a similar function for sequential expressions as well.
3.3 Converting to Sequential Form
Sequential Form asserts that throughout a program, any operator
expression contains arguments that are immediate: that is, are numbers
or identifiers, and therefore don’t perform any computation of their own.
Additionally, we can think of the decision in an If expression to be an
operation, so we also need to ensure that the condition is immediate.
Design a function
sequentialize(e: &Exp<u32>) -> SeqExp<()>that takes an expression tagged with unique numbers and produces a new expression that is in Sequential Form. If you implementedtag_expabove, you can assume that all of theu32annotations in the input are unique. Because this program will include some new expressions, we’re willing to discard any decorations we had on the expression, which explains why the input could be decorated with any type of information, but the output will just be decorated with unit values.
When you need to generate fresh names (i.e., unique names that
aren’t used anywhere in the expression), a useful strategy is to
generate names of the form format!("#{}_{}", reason, tag),
where reason is "prim_arg_1", "prim_arg_2", "if", etc., and
tag is the annotation on the expression. So if you need to generate a
variable and your input expression is Exp::Prim(op, es, 7), you
can use the name "#prim_7". This is only a suggestion, you may
use whatever strategy you like in your compiler.
Additional examples of sequentialize are given below.
3.4 Compilation
{
Adapt your
compile_to_instrsfunction from Adder to compile the sequential expression forms in Boa. This means refactoring the similar cases as well as adding new support forIfandPrim. Remember that a simple invariant is for the code outputted bycompile_to_instrsto always leave its answer inrax; this invariant is not the most efficient way to compile but will make it easier to get correct code
}
The starter code includes an extended compile_to_string that
you may want to use as a wrapper around your compile_to_instrs
function to include the boilerplate.
4 Recommendations
Here’s an order in which you could consider tackling the implementation:
Write some tests for the input and output of
sequentializefor nestedPrimexpressions to understand what the sequentialization transformation looks like on those examples.If you choose to, implement
tag_exp(and possiblytag_seqexp) as described above.Work through both the
sequentializeimplementation and theLetcase ofcompile_to_instrs. Write tests as you go.Finish the
Ifcase forcompile_to_instrs(with immediate conditions) so you can run simple programs with if.Write some tests for the input and output of performing
sequentializeon if-expressions, again to get a feel for what the transformation looks like.Work through both the
sequentializeimplementation and thePrimcase ofcompile_to_instrs. Write tests as you go.
5 Testing the Compiler
As with Adder, we will have integration tests in
tests/examples.rs with example files in the examples/
directory. You can (and should!) re-use your examples from the adder
homework. Over the course of the semester you should accumulate a large
amount of example programs that will help greatly when we start adding
more complex features and optimizations.
Additionally, you may find it beneficial to unit test your
sequentialize function in the provided submodule of
compile.rs.
6 Running main
Running your own programs is the same as with Adder, except by
convention we’ll use the .boa file extension.
7 List of Deliverables
your
compile.rsandasm.rsthe other src/*.rs files in the starter code
any additional modules you saw fit to write
the Cargo.toml
integration tests (
tests/examples.rs)any test input programs (
examples/*.boafiles)
8 Grading Standards
For this assignment, you will be graded on whether your code implements the specification (functional correctness).
For auto-grading, we include tests from past assignments as regression tests, as well as new ones for the additional features that have been added.
This assignment also features a few hidden tests that count for approximately 10% of the final score and will not be shown until after the late due date. In general the hidden tests test out combinations of features in larger programs, so be sure to test your compiler thoroughly!
9 Submission
Wait! Please read the assignment again and verify that you have not forgotten anything!
Please submit your homework on gradescope by the above deadline.
10 Additional sequential form examples
To address some recurring questions, here are some additional examples of sequentialization.
Given the boa program
let x = add1(2) in xThe most straightforward sequentialization algorithm will produce
let x = (let tmp = 2 in add1(tmp)) in xtmp can be whatever variable name you generate that is
guaranteed to be different from all others.However, notice that the original program was already in sequential
form, so the temporary variable tmp is not truly necessary. So
another valid sequentialization would be to return the program
unchanged:
let x = add1(2) in xYour sequentialization function can produce either one, just make sure your tests align with the strategy that you choose.
Here are some more examples.
let x1 = 1, x2 = 2 in
x1 + x2let x1 = 1 in
let x2 = 2 in
let tmp0 = x1 in
let tmp1 = x2 in
tmp0 + tmp1let x1 = 1 in
let x2 = 2 in
x1 + x2Next,
(2 * 9) + (18 - 3)let tmp0 = (let tmp1 = 2 in let tmp2 = 9 in tmp1 * tmp2) in
let tmp3 = (let tmp4 = 18 in let tmp5 = 3 in tmp4 - tmp5) in
tmp0 + tmp3let tmp0 = 2 * 9 in
let tmp3 = 18 - 3 in
tmp0 + tmp3Finally, an example with if:
3 + (if 5: add1(6) else: 7)let tmp0 = 3 in
let tmp1 = (let tmp2 = 5 in
if tmp2:
let tmp3 = 6 in add1(tmp3)
else:
7) in
tmp0 + tmp1let tmp1 = (if 5:
add1(6)
else:
7) in
3 + tmp11Technically, labels are not
instructions, so Instr is a bit mis-named.