Assignment 1: Adder: A starter language
Due: Fri 09/23 at 5:00pm
git clone
In this assignment you’ll implement a compiler for a small language called Adder (because it primarily adds things). Click for scary snake picture!
1 The Adder Language
In each of the next several assignments, we’ll introduce a language that we’ll
implement. We’ll start small, and build up features incrementally. We’re
starting with Adder, which has just a few features —
There are a few pieces that go into defining a language for us to compile.
A description of the concrete syntax —
the text the programmer writes A description of the abstract syntax —
how to express what the programmer wrote in a data structure our compiler uses. The semantics — or description of the behavior — of the abstract syntax, so our compiler knows what the code it generates should do.
1.1 Concrete Syntax
The concrete syntax of Adder is:
‹expr› NUMBER IDENTIFIER add1 ( ‹expr› ) sub1 ( ‹expr› ) let ‹bindings› in ‹expr› ( ‹expr› ) ‹bindings› IDENTIFIER = ‹expr› IDENTIFIER = ‹expr› , ‹bindings›
The main difference from the language we discussed in class is that a
let
expression can have one or many bindings. An
IDENTIFIER is any non-sequence
of letters and digits (starting with a letter). Any highlighted text is a literal token, meaning the
programmer must type exactly those characters or keywords.
1.2 Abstract Syntax
The abstract syntax of Adder is a Rust datatype, and corresponds nearly one-to-one with the concrete syntax. We parameterize the AST by a type of annotations.
#[derive(Clone, Debug, PartialEq, Eq)]
pub enum Exp<Ann> {
Num(i64, Ann),
Var(String, Ann),
Prim1(Prim1, Box<Exp<Ann>>, Ann),
Let { bindings: Vec<(String, Exp<Ann>)>, // new binding declarations
body: Box<Exp<Ann>>, // the expression in which the new variables are bound
ann: Ann
}
}
#[derive(Clone, Debug, PartialEq, Eq)]
pub enum Prim1 {
Add1,
Sub1,
}
For now, the only annotations we’ll use are Span
s, which are
regions of the source code (defined in span.rs
). These are used
to give nicer error messages.
1.3 Semantics
An Adder program always evaluates to a single integer. Num
s
evaluate to themselves (so a program just consisting of Num(5)
should evaluate to the integer 5
). Primitive expressions
perform addition or subtraction by one on their argument. Let
bindings should evaluate all the binding expressions to values one by
one, and after each, store a mapping from the given name to the
corresponding value in both (a) the rest of the bindings, and (b) the
body of the let expression. Identifiers evaluate to whatever their
current stored value is. There are several examples further down to
make this concrete.
The compiler should return an error if
There is a binding list containing two or more bindings with the same name
An identifier is unbound (there is no surrounding let binding for it)
These errors are encoded in the CompileError
type in
compile.rs
.
Here are some examples of Adder programs (ignoring the Ann
written in the style of Debug):
Concrete Syntax |
| Abstract Syntax |
| Answer |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Note that shadowing of variables is allowed, but all variable names in a single binding list must be unique. So
let x = 3 in
let x = 4 in
x
is a valid program, but
let x = 3, x = 4 in
x
should produce a compile-time error.
2 Starter code for this assignment
You’ve been given a starter codebase that has several pieces of infrastructure:
A file
syntax.rs
that defines our abstract syntax trees.A file
asm.rs
that defines types registers, instructions and their arguments. Read over these types carefully to see how we encode the invariants of x86-64 using Rust’s type system.A file
compile.rs
that defines the compiler to a sequence of instructions. You will be editing this file.lib.rs
which declares the modules in the package.A file
parser.lalrpop
that defines a grammer describing the concrete syntax of the language andparser.rs
, an automatically generated file that parses according to that grammar.A main program (
main.rs
) that uses the support code inspan.rs
,runner.rs
andinterp.rs
to implement interactively running the compiler, linking with the runtime and using the reference interpreter.A file
runtime/stub.rs
that links with your compilation output to form executables.A file
tests/examples.rs
with some helper code for writing integration tests (see below).A file
examples/parse_error.adder
to demonstrate where to put example files for integration tests.
You will complete the compiler by editing compile.rs
, but you
are free to add more modules if you like (you will need to list them
in lib.rs
if you do). Remember, you can make any helper
functions you want, but do not change the type of any
function marked pub
or your code will fail to type check
against the autograder.
You can use test/examples.rs
to write integration tests (see below).
3 Implementing a Compiler for Adder
3.1 Writing the Compiler
The primary task of writing the Adder compiler is simple to state: take an
instance of the Exp
datatype and turn it into a list of assembly
instructions. The provided compiler skeleton is set up to do just this,
broken up over a couple of functions in compile.rs
.
First:
pub fn check_prog<Span>(e: &Exp<Span>) -> Result<(), CompileErr<Span>>
where Span: Clone
is a function for checking that the parsed program is well-formed. If
it is well formed, the function returns Ok(())
and if it is
ill-formed, it returns Err(e)
where e
is an informative
error message. Note that this is a generic function, generic in
the type Span
of source locations, with the side-condition that
Span
implements the Clone
trait. This just means that
you don’t need to care about the details of how Span
s are
implemented, except that you can clone
them if you ever need to
use one in a CompileErr<Span>
.
Next:
pub fn compile_to_instrs<Ann>(e: &Exp<Ann>) -> Vec<Instr>
is the main code-generation function. This takes a Exp
value
(abstract syntax, with annotations) and turns it into a list of
assembly instructions, represented by the Instr
type. Use only
the provided instruction types for this assignment; we will be
gradually expanding this set of instructions as the semester
progresses. We suggest you read asm.rs
to get familiar with the
x64 assembly instructions you have available to you.
For this assignment, do not simply run an interpreter on the code and output a two line-assembly program that directly returns the final result. You will receive 0 points on the assignment if you do so. Once we add interactive features to the language (reading from stdin/command-line arguments) this won’t be possible anyway.
3.2 Testing the Compiler
The code-generation portion of the compiler pipeline is not as amenable to unit testing since we do not want to over-constrain what specific instructions we use for a given input. Instead, this week we will be introducing integration tests, which test the entire compiler pipeline.
Cargo looks for integration tests in the tests/
directory. Included in the startup code are two macros, mk_test
and mk_fail_test
for writing end-to-end compiler tests. To use
them, add a file to the examples/
directory and then call
mk_test
with a name for the test, the name of the file in the
example directory you are testing, and the expected number for it to
output. For failing tests, you can use mk_fail_test
to include
a string that you expect to occur in the error message.
3.3 Running main and compiling to binary
main.rs
provides some convenient infrastructure for running
your compiler and interpreter from a terminal. To see the expected
arguments simply run cargo run
with no options. Note that to
distinguish the flags for your program from Cargo’s flags you need to
insert --
before your flags. For example, to run the
interpreter on a file example.adder
you can use
$ cargo run -- --interp example.adder
Alternatively, you can compile main.rs
to a binary using
cargo build --release
. Then the binary will be located at
./target/release/snake
.
4 Cross-platform Issues
On any platform you need to have nasm
and ar
installed. Ask on Piazza if you have any issues installing these. If
you have installed these correctly and are using Mac or Linux the
runner should correctly use the right flags for your platform when
creating the final binary.
Here are the explicit commands the runner uses to build an executable
from an assembly code file compiled_code.s
and the rust stub
stub.rs
. First build an object file using nasm
,
passing the appropriate format flag for your platform.
$ nasm -felf64 compiled_code.s -o compiled_code.o # Linux
$ nasm -fmacho64 compiled_code.s -o compiled_code.o # Mac
Then build a static library file to link with the rust file
$ ar r libcompiled_code.a compiled_code.o # Linux & Mac
Finally, compile the rust file, indicating where to look for the static library.
$ rustc stub.rs -L . -o stub.exe # Linux
$ rustc stub.rs -L . --target x86_64-apple-darwin -o stub.exe # Mac
This should make an executable that you can run.
$ ./stub.exe
We recommend you try this out with the sample file
section .text
global start_here
start_here:
mov RAX, 483
ret
nasm
and ar
correctly.5 Rust Tips
To read more about pattern matching on structs like MemRef
and
enums with named arguments like Exp
see
this
chapter of the Rust book.
Sometimes you need to convert between different numeric types in Rust,
typically using the Into
and TryInto
traits. These
traits can be a bit tedious to use in my opinion because they often
need type annotations in weird places. To make things simple for you,
We’ve included a function usize_to_i32
in compile.rs
that tries to convert a usize
to an i32
, panicking if it
does not fit. We used this in the reference solution so you may find it
useful.
Finally, in checking binding-lists for duplicates you might want to use a data structure to keep track of the names. You can use a vector, like we use in the interpreter, or you may want to try out the HashSet module in the standard library.
6 List of Deliverables
Include (at least) the following files in your zip submission (you
only need to change src/compile.rs
).
Cargo.toml
src/compile.rs
src/lib.rs
src/asm.rs
src/interp.rs
src/parser.rs
src/runner.rs
src/syntax.rs
src/span.rs
any additional modules you saw fit to write
your compiler integration tests (
tests/examples.rs
)The test input programs (
examples/*.adder
files) you wrote
We’ve included a script mk-submission.sh
in the starter code
repo that should make the correct zip file for you.
7 Grading Standards
For this assignment, you will be graded on whether your code implements the specification (functional correctness). We encourage you to test but your test coverage is not graded.
8 Submission
Wait! Please read the assignment again and verify that you have not forgotten anything!
Please submit your homework on gradescope by the above deadline.