Assignment 1: Adder: A starter language

8.7

Assignment 1: Adder: A starter language

Due: Fri 09/23 at 5:00pm

git clone

In this assignment you’ll implement a compiler for a small language called Adder (because it primarily adds things). Click for scary snake picture!

1 The Adder Language

In each of the next several assignments, we’ll introduce a language that we’ll implement. We’ll start small, and build up features incrementally. We’re starting with Adder, which has just a few features — defining variables, and primitive operations on numbers.

There are a few pieces that go into defining a language for us to compile.

A description of the concrete syntax — the text the programmer writes
A description of the abstract syntax — how to express what the programmer wrote in a data structure our compiler uses.
The semantics — or description of the behavior — of the abstract syntax, so our compiler knows what the code it generates should do.

1.1 Concrete Syntax

The concrete syntax of Adder is:

‹expr›: | NUMBER | IDENTIFIER | add1 ( ‹expr› ) | sub1 ( ‹expr› ) | let ‹bindings› in ‹expr› | ( ‹expr› ) ‹bindings›: | IDENTIFIER = ‹expr› | IDENTIFIER = ‹expr› , ‹bindings›

The main difference from the language we discussed in class is that a let expression can have one or many bindings. An IDENTIFIER is any non-sequence of letters and digits (starting with a letter). Any highlighted text is a literal token, meaning the programmer must type exactly those characters or keywords.

1.2 Abstract Syntax

The abstract syntax of Adder is a Rust datatype, and corresponds nearly one-to-one with the concrete syntax. We parameterize the AST by a type of annotations.

#[derive(Clone, Debug, PartialEq, Eq)]
pub enum Exp<Ann> {
    Num(i64, Ann),
    Var(String, Ann),
    Prim1(Prim1, Box<Exp<Ann>>, Ann),
    Let { bindings: Vec<(String, Exp<Ann>)>, // new binding declarations
          body: Box<Exp<Ann>>,  // the expression in which the new variables are bound
          ann: Ann
        }
}

#[derive(Clone, Debug, PartialEq, Eq)]
pub enum Prim1 {
    Add1,
    Sub1,
}

For now, the only annotations we’ll use are Spans, which are regions of the source code (defined in span.rs). These are used to give nicer error messages.

1.3 Semantics

An Adder program always evaluates to a single integer. Nums evaluate to themselves (so a program just consisting of Num(5) should evaluate to the integer 5). Primitive expressions perform addition or subtraction by one on their argument. Let bindings should evaluate all the binding expressions to values one by one, and after each, store a mapping from the given name to the corresponding value in both (a) the rest of the bindings, and (b) the body of the let expression. Identifiers evaluate to whatever their current stored value is. There are several examples further down to make this concrete.

The compiler should return an error if

There is a binding list containing two or more bindings with the same name
An identifier is unbound (there is no surrounding let binding for it)

These errors are encoded in the CompileError type in compile.rs.

Here are some examples of Adder programs (ignoring the Ann written in the style of Debug):

Concrete Syntax	Abstract Syntax	Answer
`5`	`Num(5)`	`5`
`sub1(add1(sub1(5)))`	`Prim1(Sub1, Prim1(Add1, Prim1(Sub1, Num(5))))`	`4`
`let x = 5 in add1(x)`	`Let{ bindings: [("x", Num(5))], body: Prim1(Add1, Var("x")) }`	`6`
`let x = 5, y = sub1(x) in sub1(y)`	`Let{ bindings: [("x", Num(5)), ("y", Prim1(Sub1, Var("x")))], body: Prim1(Sub1, Var("y"))}`	`3`

Note that shadowing of variables is allowed, but all variable names in a single binding list must be unique. So

let x = 3 in
let x = 4 in
x

is a valid program, but

let x = 3, x = 4 in
x

should produce a compile-time error.

2 Starter code for this assignment

You’ve been given a starter codebase that has several pieces of infrastructure:

A file syntax.rs that defines our abstract syntax trees.
A file asm.rs that defines types registers, instructions and their arguments. Read over these types carefully to see how we encode the invariants of x86-64 using Rust’s type system.
A file compile.rs that defines the compiler to a sequence of instructions. You will be editing this file.
lib.rs which declares the modules in the package.
A file parser.lalrpop that defines a grammer describing the concrete syntax of the language and parser.rs, an automatically generated file that parses according to that grammar.
A main program (main.rs) that uses the support code in span.rs, runner.rs and interp.rs to implement interactively running the compiler, linking with the runtime and using the reference interpreter.
A file runtime/stub.rs that links with your compilation output to form executables.
A file tests/examples.rs with some helper code for writing integration tests (see below).
A file examples/parse_error.adder to demonstrate where to put example files for integration tests.

You will complete the compiler by editing compile.rs, but you are free to add more modules if you like (you will need to list them in lib.rs if you do). Remember, you can make any helper functions you want, but do not change the type of any function marked pub or your code will fail to type check against the autograder.

You can use test/examples.rs to write integration tests (see below).

3 Implementing a Compiler for Adder

3.1 Writing the Compiler

The primary task of writing the Adder compiler is simple to state: take an instance of the Exp datatype and turn it into a list of assembly instructions. The provided compiler skeleton is set up to do just this, broken up over a couple of functions in compile.rs.

First:

pub fn check_prog<Span>(e: &Exp<Span>) -> Result<(), CompileErr<Span>>
where Span: Clone

is a function for checking that the parsed program is well-formed. If it is well formed, the function returns Ok(()) and if it is ill-formed, it returns Err(e) where e is an informative error message. Note that this is a generic function, generic in the type Span of source locations, with the side-condition that Span implements the Clone trait. This just means that you don’t need to care about the details of how Spans are implemented, except that you can clone them if you ever need to use one in a CompileErr<Span>.

pub fn compile_to_instrs<Ann>(e: &Exp<Ann>) -> Vec<Instr>

is the main code-generation function. This takes a Exp value (abstract syntax, with annotations) and turns it into a list of assembly instructions, represented by the Instr type. Use only the provided instruction types for this assignment; we will be gradually expanding this set of instructions as the semester progresses. We suggest you read asm.rs to get familiar with the x64 assembly instructions you have available to you.

For this assignment, do not simply run an interpreter on the code and output a two line-assembly program that directly returns the final result. You will receive 0 points on the assignment if you do so. Once we add interactive features to the language (reading from stdin/command-line arguments) this won’t be possible anyway.

3.2 Testing the Compiler

The code-generation portion of the compiler pipeline is not as amenable to unit testing since we do not want to over-constrain what specific instructions we use for a given input. Instead, this week we will be introducing integration tests, which test the entire compiler pipeline.

Cargo looks for integration tests in the tests/ directory. Included in the startup code are two macros, mk_test and mk_fail_test for writing end-to-end compiler tests. To use them, add a file to the examples/ directory and then call mk_test with a name for the test, the name of the file in the example directory you are testing, and the expected number for it to output. For failing tests, you can use mk_fail_test to include a string that you expect to occur in the error message.

3.3 Running main and compiling to binary

main.rs provides some convenient infrastructure for running your compiler and interpreter from a terminal. To see the expected arguments simply run cargo run with no options. Note that to distinguish the flags for your program from Cargo’s flags you need to insert -- before your flags. For example, to run the interpreter on a file example.adder you can use

$ cargo run -- --interp example.adder

Alternatively, you can compile main.rs to a binary using cargo build --release. Then the binary will be located at ./target/release/snake.

4 Cross-platform Issues

On any platform you need to have nasm and ar installed. Ask on Piazza if you have any issues installing these. If you have installed these correctly and are using Mac or Linux the runner should correctly use the right flags for your platform when creating the final binary.

Here are the explicit commands the runner uses to build an executable from an assembly code file compiled_code.s and the rust stub stub.rs. First build an object file using nasm, passing the appropriate format flag for your platform.

$ nasm -felf64 compiled_code.s -o compiled_code.o   # Linux
$ nasm -fmacho64 compiled_code.s -o compiled_code.o # Mac

Then build a static library file to link with the rust file

$ ar r libcompiled_code.a compiled_code.o # Linux & Mac

Finally, compile the rust file, indicating where to look for the static library.

$ rustc stub.rs -L . -o stub.exe                              # Linux
$ rustc stub.rs -L . --target x86_64-apple-darwin -o stub.exe # Mac

This should make an executable that you can run.

$ ./stub.exe

We recommend you try this out with the sample file

section .text
global start_here
start_here:
  mov RAX, 483
  ret

To make sure you have installed nasm and ar correctly.

5 Rust Tips

To read more about pattern matching on structs like MemRef and enums with named arguments like Exp see this chapter of the Rust book.

Sometimes you need to convert between different numeric types in Rust, typically using the Into and TryInto traits. These traits can be a bit tedious to use in my opinion because they often need type annotations in weird places. To make things simple for you, We’ve included a function usize_to_i32 in compile.rs that tries to convert a usize to an i32, panicking if it does not fit. We used this in the reference solution so you may find it useful.

Finally, in checking binding-lists for duplicates you might want to use a data structure to keep track of the names. You can use a vector, like we use in the interpreter, or you may want to try out the HashSet module in the standard library.

6 List of Deliverables

Include (at least) the following files in your zip submission (you only need to change src/compile.rs).

Cargo.toml
src/compile.rs
src/lib.rs
src/asm.rs
src/interp.rs
src/parser.rs
src/runner.rs
src/syntax.rs
src/span.rs
any additional modules you saw fit to write
your compiler integration tests (tests/examples.rs)
The test input programs (examples/*.adder files) you wrote

We’ve included a script mk-submission.sh in the starter code repo that should make the correct zip file for you.

7 Grading Standards

For this assignment, you will be graded on whether your code implements the specification (functional correctness). We encourage you to test but your test coverage is not graded.

8 Submission

Wait! Please read the assignment again and verify that you have not forgotten anything!

Please submit your homework on gradescope by the above deadline.

1	The Adder Language
2	Starter code for this assignment
3	Implementing a Compiler for Adder
4	Cross-platform Issues
5	Rust Tips
6	List of Deliverables
7	Grading Standards
8	Submission