Assignment 1: Straightline Code

8.15

Assignment 1: Straightline Code🔗

Due: Fri 01/31 at 11:59pm

git clone

In this assignment you’ll implement a compiler for version 1.0 of the Snake language (codename "Adder"). The adder source language is an expression language supporting let-binding and binary arithmetic operations and we will compile it into a single block of straightline assembly code.

The "middle end" of the compiler will be covered in detail on January 22nd, but the frontend can be completed after the January 15 lecture, and much of the backend as well.

1 The Source Language🔗

There are a few pieces that go into defining a language for us to compile.

A description of the concrete syntax — the text the programmer writes
A description of the abstract syntax — how to express what the programmer wrote in a data structure our compiler uses.
The semantics — or description of the behavior — of the abstract syntax, so our compiler knows what the code it generates should do.

1.1 Concrete Syntax🔗

‹prog›: def main ( IDENTIFIER ) : ‹expr› ‹expr›: | let IDENTIFIER = ‹expr› in ‹expr› | ‹binop-expr› ‹binop-expr›: | NUMBER | IDENTIFIER | ‹expr› + ‹expr› | ‹expr› - ‹expr› | ‹expr› * ‹expr› | ( ‹expr› )

The main differences between this version of adder and the versions introduced in lecture are that let expressions can have one or many bindings. An IDENTIFIER is any non-sequence of letters and digits (starting with a letter). Any highlighted text is a literal token, meaning the programmer must type exactly those characters or keywords.

1.2 Abstract Syntax🔗

The abstract syntax of Adder is a Rust datatype, and corresponds closely with the concrete syntax. You don’t have to worry about implementing a parser, we have provided one for you that will parse the concrete syntax into an abstract syntax tree.

The Adder program is parsed into the Prog struct which contains the input parameter name param and the main Expr called main. An Expr is either a number, a variable, a primitive operation or a sequence of let-bindings. Each of these constructors contains a loc: SrcLoc that corresponds to the source location information for that sub-expression. This information is only used to provide better error messages, you don’t need to inspect the internal details of SrcLoc type.

1.3 Semantics🔗

An Adder program takes in a single 64-bit integer as input and evaluates to a single 64-bit integer. Nums evaluate to themselves (so a program just consisting of Num(5) should evaluate to the integer 5). Primitive expressions perform addition or subtraction by one on their argument. Let bindings should evaluate all the binding expressions to values one by one, and after each, store a mapping from the given name to the corresponding value in both (a) the rest of the bindings, and (b) the body of the let expression. Identifiers evaluate to whatever their current stored value is. There are several examples further down to make this concrete.

The compiler should return an error if

There is a binding list containing two or more bindings with the same name
An identifier is unbound (there is no surrounding let binding for it)

These errors are encoded in the CompileError type in compile.rs.

Note that shadowing of variables is allowed, but all variable names in a single binding list must be unique. So

let x = 3 in
let x = 4 in
x

is a valid program, but

let x = 3, x = 4 in
x

should produce a compile-time error.

Here are some examples of Adder expressions and how they parse into Rust values (ignoring the SrcLoc information):

Concrete Syntax	Abstract Syntax	Result
`5`	`Num(5)`	`5`
`sub1(add1(sub1(5)))`	`Prim{ prim: Sub1, args: [ Prim{ prim: Add1, args: [ Prim{ prim: Sub1, args: [ Num(5)]}]}]}`	`4`
`let x = 5 in add1(x)`	`Let{ bindings: [Binding { var: "x" , expr: Num(5)} ], body: Prim{ prim: Add1, args: [Var("x")]} }`	`6`
`let x = 5, y = sub1(x) in sub1(y)`	`Let{ bindings: [Binding{ var: "x", expr: Num(5) }, Binding{ var: "y", expr: Prim{ prim: Sub1, args{ [ Var("y") ]}}}], body: Prim{ prim: Sub1, args: [ Var("y") ]}}`	`3`
`4 + 7 * 3`	`Prim{ prim: Add, args: [ Var("x"), Prim{ prim: Mul, args: [ Num(7), Var("y")]}]}`	`25`

Your program will not just be a single expression, but a main function that takes an input, for instance the program

def main(x):
  x * x

will be parsed into the abstract syntax

Prog { param: ("x") , main: Prim{ prim: Mul, args: [ Var("x"), Var("x") ]} }

and its semantics is a function that outputs the square of the input.

2 Compilation and Starter Code🔗

The starter code provides much of the architecture of the compiler already:

mk_submission.sh, a script that creates a zip file with the correct files to be submitted to the autograder.
Cargo.toml the package description file.
runtime/stub.rs a Rust wrapper that calls our compiled assembly code function with an argument provided from the command line
src/main.rs, src/compile.rs and runner.rs are wrappers providing a command line interface to the compiler pipeline and interpreters. You should look at src/compile.rs to see how the compiler pipeline functions.
src/ast.rs, src/ssa.rs and src/asm.rs have type definitions for the abstract syntax trees, intermediate representation and assembly code, respectively
src/interp.rs and src/pretty.rs implement interpreters and pretty printers for abstract syntax trees and the intermediate representation
src/identifiers.rs utilities for working with identifiers.
src/parser.rs, generated code for lexing and parsing source programs into abstract syntax trees, which is generated from src/parser.lalrpop using the Lalrpop parser generator.
examples/ a directory for putting example programs
tests/examples.rs a file with some testing macros to help you test your compiler against the examples in examples/

You will complete the implementation of the compilation pipeline in three files: front_end.rs, middleend.rs and back_end.rs.

2.1 Front End🔗

After the program is parsed into an abstract syntax tree, the final stage of the front end is to ensure that the program satisfies the scoping rules for the language: all variables are declared before they are used and that multiple-let forms do not declare the same variable more than once. These errors are defined in CompileError. Additionally, to avoid having to deal with shadowing later in the compiler, we want to resolve the source-level identifiers into unique identifiers for use in the remainder of the compiler. This functionality is implemented in the resolve_prog method of the Resolver struct, a simple wrapper object that is used as a source of unique identifiers. Your task is to fill in the details of resolve_prog sub-procedure for this method. You will need to choose an appropriate notion of auxiliary environment to correctly implement this functionality. As in class, you may find it useful to use the HashMap from the im package, a package for immutable data structures that support sharing. The documentation for this data structure is here, most relevant are the new, get and insert methods.

2.2 Middle End🔗

As discussed in lecture on January 22nd, we next want to make the execution order of our program explicit by compiling it into our intermediate representation, a BlockBody of straight-line code, defined in src/ssa.rs. A BlockBody is either Returns and immediate value or it performs an Operation, binds that to a dest variable and then continues with a next BlockBody.

This lowering is implemented as a method lower_prog around the Lowerer struct. Your task is to implement the core sub-procedure for this lowering process, lower_expr_kont. The parameter k is the Continuation of the code you are lowering, that is, it is a pair of a destination variable where the output of the current expression should be stored and a BlockBody that will execute after the current expression. The initial continuation is provided in lower_prog to just return the output, but you will build this up recursively in order to produce your flattened code from a tree.

2.3 Back End🔗

To implement the final code generation phase of the pipeline, we need a way to map the variables in our intermediate representation into concrete memory locations. As discussed in lecture, we will store our local variables on the stack, as a memory locations relative to the stack pointer rsp. Concretely, in backend.rs, you will implement the emit_prog function, which produces assembly instructions from the intermediate representation. We have already provided the outline of the emit function, which sets up the correct .text section and global declarations to link with runtime/stub.rs. In doing so, you will need to design and maintain some kind of environment for mapping the intermediate representation variables to stack offsets.

2.4 Command-Line Interface🔗

To build the command-line interface run cargo build --release, after which the executable will be available at target/release/snake. Run ./target/release/snake --help for an overview of the command-line interface. To help you implement your compiler gradually, the command-line interface that supports partial execution of the compiler pipeline. That is, you can pass the --target tgt or -t tgt option to specify that you want the compiler to produce an ast, a resolved ast, your ssa intermediate representation, your assembly code, or the linked executable. You can also pass the --execute n or -x n option to say to run your code with the input argument n. By default this will compile your program to an executable and then run it, but if you select an (resolved) ast or ssa this will run the provided interpreter instead.

2.5 Testing🔗

The file test/examples.rs contains utilities for testing the output of your compiler against example programs you include in the examples/ directory. We have provided a few examples to show how you can use this to test your end-to-end compiler pipeline, as well as the front end and middle ends. You can execute the command cargo test to run your tests.

3 Producing executables manually, installation issues🔗

On any platform you need to have nasm and ar installed. Ask on Piazza if you have any issues installing these. If you have installed these correctly and are using Mac or Linux the runner should correctly use the right flags for your platform when creating the final binary.

Here are the explicit commands the starter code uses to build an executable from an assembly code file compiled_code.s and the rust stub stub.rs. First build an object file using nasm, passing the appropriate format flag for your platform.

$ nasm -felf64 compiled_code.s -o compiled_code.o   # Linux
$ nasm -fmacho64 compiled_code.s -o compiled_code.o # Mac

Then build a static library file to link with the rust file

$ ar r libcompiled_code.a compiled_code.o # Linux & Mac

Finally, compile the rust file, indicating where to look for the static library.

$ rustc stub.rs -L . -o stub.exe                              # Linux
$ rustc stub.rs -L . --target x86_64-apple-darwin -o stub.exe # Mac

This should make an executable that you can run.

$ ./stub.exe

We recommend you try this out with the sample file

section .text
global start_here
start_here:
  mov rax, 483
  ret

To make sure you have installed nasm and ar correctly.

4 Rust Tips🔗

To read more about pattern matching on Rust enums see this chapter of the Rust book.

In checking binding-lists for duplicates you might want to use a data structure to keep track of the names. For this you may want to try out the HashSet module in the standard library or the im library.

5 List of Deliverables🔗

For this assignment you will submit your updated versions of three files:

src/frontend.rs
src/middle_end.rs
src/backend.rs

These are the only files that you can change in your submission, your code will be linked with reference versions of the other files.

We’ve included a script mk_submission.sh in the starter code repo that should make zip file with just those three files and your testing files that you should upload to gradescope. The testing files are included for us to get some idea of how students are testing their code, they are not graded.

6 Grading Standards🔗

For this assignment, you will be autograded on whether your code implements the specification (functional correctness). We encourage you to test but your test coverage is not graded.

7 Submission🔗

Wait! Please read the assignment again and verify that you have not forgotten anything!

Please submit your homework on gradescope by the above deadline.

1	The Source Language
2	Compilation and Starter Code
3	Producing executables manually, installation issues
4	Rust Tips
5	List of Deliverables
6	Grading Standards
7	Submission