Assignment 1: Straightline Code
Due: Fri 01/31 at 11:59pm
git clone
In this assignment you’ll implement a compiler for version 1.0 of the Snake language (codename "Adder"). The adder source language is an expression language supporting let-binding and binary arithmetic operations and we will compile it into a single block of straightline assembly code.
The "middle end" of the compiler will be covered in detail on January 22nd, but the frontend can be completed after the January 15 lecture, and much of the backend as well.
1 The Source Language
There are a few pieces that go into defining a language for us to compile.
A description of the concrete syntax —
the text the programmer writes A description of the abstract syntax —
how to express what the programmer wrote in a data structure our compiler uses. The semantics — or description of the behavior — of the abstract syntax, so our compiler knows what the code it generates should do.
1.1 Concrete Syntax
‹prog› def main ( IDENTIFIER ) : ‹expr› ‹expr› let IDENTIFIER = ‹expr› in ‹expr› ‹binop-expr› ‹binop-expr› NUMBER IDENTIFIER ‹expr› + ‹expr› ‹expr› - ‹expr› ‹expr› * ‹expr› ( ‹expr› )
The main differences between this version of adder and the versions
introduced in lecture are that let
expressions can have one
or many bindings. An IDENTIFIER is any non-sequence of letters and digits
(starting with a letter). Any highlighted
text is a literal token, meaning the programmer must type exactly
those characters or keywords.
1.2 Abstract Syntax
The abstract syntax of Adder is a Rust datatype, and corresponds closely with the concrete syntax. You don’t have to worry about implementing a parser, we have provided one for you that will parse the concrete syntax into an abstract syntax tree.
The Adder program is parsed into the Prog
struct which contains
the input parameter name param
and the main Expr
called
main
. An Expr
is either a number, a variable, a
primitive operation or a sequence of let-bindings. Each of these
constructors contains a loc: SrcLoc
that corresponds to the
source location information for that sub-expression. This information
is only used to provide better error messages, you don’t need to
inspect the internal details of SrcLoc
type.
1.3 Semantics
An Adder program takes in a single 64-bit integer as input and evaluates to a single 64-bit integer. Num
s
evaluate to themselves (so a program just consisting of Num(5)
should evaluate to the integer 5
). Primitive expressions
perform addition or subtraction by one on their argument. Let
bindings should evaluate all the binding expressions to values one by
one, and after each, store a mapping from the given name to the
corresponding value in both (a) the rest of the bindings, and (b) the
body of the let expression. Identifiers evaluate to whatever their
current stored value is. There are several examples further down to
make this concrete.
The compiler should return an error if
There is a binding list containing two or more bindings with the same name
An identifier is unbound (there is no surrounding let binding for it)
These errors are encoded in the CompileError
type in
compile.rs
.
Note that shadowing of variables is allowed, but all variable names in a single binding list must be unique. So
let x = 3 in
let x = 4 in
x
is a valid program, but
let x = 3, x = 4 in
x
should produce a compile-time error.
Here are some examples of Adder expressions and how they parse into Rust values (ignoring the SrcLoc
information):
Concrete Syntax |
| Abstract Syntax |
| Result |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Your program will not just be a single expression, but a main function that takes an input, for instance the program
def main(x):
x * x
Prog { param: ("x") , main: Prim{ prim: Mul, args: [ Var("x"), Var("x") ]} }
and its semantics is a function that outputs the square of the input.
2 Compilation and Starter Code
The starter code provides much of the architecture of the compiler already:
mk_submission.sh
, a script that creates a zip file with the correct files to be submitted to the autograder.Cargo.toml
the package description file.runtime/stub.rs
a Rust wrapper that calls our compiled assembly code function with an argument provided from the command linesrc/main.rs
,src/compile.rs
andrunner.rs
are wrappers providing a command line interface to the compiler pipeline and interpreters. You should look atsrc/compile.rs
to see how the compiler pipeline functions.src/ast.rs
,src/ssa.rs
andsrc/asm.rs
have type definitions for the abstract syntax trees, intermediate representation and assembly code, respectivelysrc/interp.rs
andsrc/pretty.rs
implement interpreters and pretty printers for abstract syntax trees and the intermediate representationsrc/identifiers.rs
utilities for working with identifiers.src/parser.rs
, generated code for lexing and parsing source programs into abstract syntax trees, which is generated fromsrc/parser.lalrpop
using the Lalrpop parser generator.examples/
a directory for putting example programstests/examples.rs
a file with some testing macros to help you test your compiler against the examples inexamples/
You will complete the implementation of the compilation pipeline in three files: front_end.rs
, middleend.rs
and back_end.rs
.
2.1 Front End
After the program is parsed into an abstract syntax tree, the final
stage of the front end is to ensure that the program satisfies the
scoping rules for the language: all variables are declared before they
are used and that multiple-let forms do not declare the same variable
more than once. These errors are defined in CompileError
.
Additionally, to avoid having to deal with shadowing later in the
compiler, we want to resolve the source-level identifiers into unique
identifiers for use in the remainder of the compiler.
This functionality is implemented in the resolve_prog
method of the Resolver
struct, a simple wrapper object that is used as a source of unique identifiers.
Your task is to fill in the details of resolve_prog
sub-procedure for this method. You will need to choose an appropriate notion of auxiliary environment to correctly implement this functionality. As in class, you may find it useful to use the HashMap
from the im
package, a package for immutable data structures that support sharing. The documentation for this data structure is here, most relevant are the new
, get
and insert
methods.
2.2 Middle End
As discussed in lecture on January 22nd, we next want to make the
execution order of our program explicit by compiling it into our
intermediate representation, a BlockBody
of straight-line code,
defined in src/ssa.rs
. A BlockBody
is either
Returns
and immediate value or it performs an Operation
,
binds that to a dest
variable and then continues with a
next
BlockBody
.
This lowering is implemented as a method lower_prog
around the
Lowerer
struct. Your task is to implement the core
sub-procedure for this lowering process, lower_expr_kont
. The
parameter k
is the Continuation
of the code you are
lowering, that is, it is a pair of a destination variable where the
output of the current expression should be stored and a
BlockBody
that will execute after the current expression. The
initial continuation is provided in lower_prog
to just return
the output, but you will build this up recursively in order to produce
your flattened code from a tree.
2.3 Back End
To implement the final code generation phase of the pipeline, we need a way to map the variables in our intermediate representation into concrete memory locations. As discussed in lecture, we will store our local variables on the stack, as a memory locations relative to the stack pointer rsp
. Concretely, in backend.rs
, you will implement the emit_prog
function, which produces assembly instructions from the intermediate representation. We have already provided the outline of the emit
function, which sets up the correct .text
section and global
declarations to link with runtime/stub.rs
. In doing so, you will need to design and maintain some kind of environment for mapping the intermediate representation variables to stack offsets.
2.4 Command-Line Interface
To build the command-line interface run cargo build --release
,
after which the executable will be available at
target/release/snake
. Run ./target/release/snake --help
for an overview of the command-line interface. To help you implement
your compiler gradually, the command-line interface that supports
partial execution of the compiler pipeline. That is, you can pass the
--target tgt
or -t tgt
option to specify that you want
the compiler to produce an ast, a resolved ast, your ssa intermediate
representation, your assembly code, or the linked executable. You can
also pass the --execute n
or -x n
option to say to run
your code with the input argument n
. By default this will
compile your program to an executable and then run it, but if you
select an (resolved) ast or ssa this will run the provided interpreter
instead.
2.5 Testing
The file test/examples.rs
contains utilities for testing the
output of your compiler against example programs you include in the
examples/
directory. We have provided a few examples to show
how you can use this to test your end-to-end compiler pipeline, as
well as the front end and middle ends.
You can execute the command cargo test
to run your tests.
3 Producing executables manually, installation issues
On any platform you need to have nasm
and ar
installed. Ask on Piazza if you have any issues installing these. If
you have installed these correctly and are using Mac or Linux the
runner should correctly use the right flags for your platform when
creating the final binary.
Here are the explicit commands the starter code uses to build an executable
from an assembly code file compiled_code.s
and the rust stub
stub.rs
. First build an object file using nasm
,
passing the appropriate format flag for your platform.
$ nasm -felf64 compiled_code.s -o compiled_code.o # Linux
$ nasm -fmacho64 compiled_code.s -o compiled_code.o # Mac
Then build a static library file to link with the rust file
$ ar r libcompiled_code.a compiled_code.o # Linux & Mac
Finally, compile the rust file, indicating where to look for the static library.
$ rustc stub.rs -L . -o stub.exe # Linux
$ rustc stub.rs -L . --target x86_64-apple-darwin -o stub.exe # Mac
This should make an executable that you can run.
$ ./stub.exe
We recommend you try this out with the sample file
section .text
global start_here
start_here:
mov rax, 483
ret
nasm
and ar
correctly.4 Rust Tips
To read more about pattern matching on Rust enums see this chapter of the Rust book.
In checking binding-lists for duplicates you might want to use a data
structure to keep track of the names. For this you may want to try out
the
HashSet
module in the standard library or the im
library.
5 List of Deliverables
For this assignment you will submit your updated versions of three files:
src/frontend.rs
src/middle_end.rs
src/backend.rs
These are the only files that you can change in your submission, your code will be linked with reference versions of the other files.
We’ve included a script mk_submission.sh
in the starter code
repo that should make zip file with just those three files and your
testing files that you should upload to gradescope. The testing files
are included for us to get some idea of how students are testing their
code, they are not graded.
6 Grading Standards
For this assignment, you will be autograded on whether your code implements the specification (functional correctness). We encourage you to test but your test coverage is not graded.
7 Submission
Wait! Please read the assignment again and verify that you have not forgotten anything!
Please submit your homework on gradescope by the above deadline.