Assignment 2: Adder: A starter language
Due: Tue 09/21 at 9:00pm
git clone
In this assignment you’ll implement a compiler for a small language called Adder (because it primarily adds things). Click for scary snake picture!
1 The Adder Language
In each of the next several assignments, we’ll introduce a language that we’ll
implement. We’ll start small, and build up features incrementally. We’re
starting with Adder, which has just a few features —
There are a few pieces that go into defining a language for us to compile.
A description of the concrete syntax —
the text the programmer writes A description of the abstract syntax —
how to express what the programmer wrote in a data structure our compiler uses. The semantics — or description of the behavior — of the abstract syntax, so our compiler knows what the code it generates should do.
1.1 Concrete Syntax
For this assignment we will be using an s-expression based syntax so you can see how your s-expression parser from last time fits into the compiler pipeline. Next week we resume using a more python/ruby-like syntax.
The concrete syntax of Adder is (remember EPSILON means the empty string):
‹expr› NUMBER IDENTIFIER ( let ( ‹bindings› ) ‹expr› ) ( add1 ‹expr› ) ( sub1 ‹expr› ) ‹bindings› EPSILON ( IDENTIFIER ‹expr› ) ‹bindings›
The main difference from the language we discussed in class (besides
the S-expression style) is that a let
expression can have
zero, one or many bindings. An IDENTIFIER is any non-sequence of letters and digits
(starting with a letter). Any highlighted
text is a literal token, meaning the programmer must type exactly
those characters or keywords.
1.2 Abstract Syntax
The abstract syntax of Adder is a Rust datatype, and corresponds nearly one-to-one with the concrete syntax. We parameterize the AST by a type of annotations.
pub enum Exp<Ann> {
Num(i64, Ann),
Var(String, Ann),
Prim1(Prim1, Box<Exp<Ann>>, Ann),
Let { bindings: Vec<(String, Exp<Ann>)>, // new binding declarations
body: Box<Exp<Ann>>, // the expression in which the new variables are bound
ann: Ann
}
}
pub enum Prim1 {
Add1,
Sub1,
}
1.3 Semantics
An Adder program always evaluates to a single integer. Num
s
evaluate to themselves (so a program just consisting of Num(5)
should evaluate to the integer 5
). Primitive expressions
perform addition or subtraction by one on their argument. Let Let
bindings should evaluate all the binding expressions to values one by
one, and after each, store a mapping from the given name to the
corresponding value in both (a) the rest of the bindings, and (b) the
body of the let expression. Identifiers evaluate to whatever their
current stored value is. There are several examples further down to
make this concrete.
The compiler should return an error if
There is a binding list containing two or more bindings with the same name
An identifier is unbound (there is no surrounding let binding for it)
These errors are encoded in the CompileError
type in
compile.rs
.
Here are some examples of Adder programs (ignoring the Ann
written in the style of Debug):
Concrete Syntax |
| Abstract Syntax |
| Answer |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Note that as discussed in class, shadowing of variables is allowed, but all variable names in a single binding list must be unique. So
(let ((x 3))
(let ((x 4)) x))
is a valid program, but
(let ((x 3)
(x 4))
x)
should produce a compile-time error.
2 Starter code for this assignment
You’ve been given a starter codebase that has several pieces of infrastructure:
A file
syntax.rs
that defines tokens, s-expressions and abstract syntax trees and the associated tokenizer/parsers. This includes an s-expression parser that shows one approach to the previous homework.A file
asm.rs
that defines types registers, instructions and their arguments. Read over these types carefully to see how we encode the invariants of x86-64 using Rust’s type system.A file
compile.rs
that defines the compiler to a sequence of instructions.lib.rs
which declares the modules in the package. You don’t need to edit this.A main program (
main.rs
) that uses the support code inrunner.rs
andinterp.rs
to implement interactively running the compiler, linking with the runtime and using the reference interpreter. You don’t need to edit either of these files.A file
runtime/stub.rs
that links with your compilation output to form executables.A file
tests/examples.rs
in which you will write integration tests (see below).A file
examples/parse_error.adder
to demonstrate where to put example files.
Your edits —test/examples.rs
, compile.rs
and asm.rs
.
3 Implementing a Compiler for Adder
3.1 Writing the Compiler
The primary task of writing the Adder compiler is simple to state: take an
instance of the Expr
datatype and turn it into a list of assembly
instructions. The provided compiler skeleton is set up to do just this,
broken up over a few functions.
The first is in compile.rs
:
compile_to_instrs(&Expr<Span>) -> Result<Vec<Instr>, CompileErr>
which takes a expr
value (abstract syntax, tagged with source
location information) and turns it into a list of assembly
instructions, represented by the instruction
type or returns
one of the possible compilation errors defined in the file. Use only
the provided instruction types for this assignment; we will be
gradually expanding this set of instructions as the semester
progresses. This function has an associated helper that takes some
extra arguments to track the variable environment and stack offset.
The other component you need to implement is in asm.rs
:
instrs_to_string(is: &[Instr]) -> String
which renders individual instances of the instruction datatype into a
string representation of the instruction (this is done for you for
mov
). Additionally there are several helper functions for
printing registers, memory references etc that you will need to fill
in. This part is straightforward, but forces you to understand
the syntax of the assembly code you are generating. Most of the
compiler concepts happen in the first step, where we are generating
assembly instructions from abstract syntax. Do use
this
assembly guide —eax, esp
,
but otherwise the syntax is mostly unchanged.
3.2 Testing the Compiler
The code-generation portion of pipeline is not as amenable to unit testing since we do not want to over-constrain what specific instructions we use for a given input. Instead, this week we will be introducing integration tests, which test the entire compiler pipeline.
Cargo looks for integration tests in the tests/
directory. Included in the startup code are two macros, mk_test
and mk_fail_test
for writing end-to-end compiler tests. To use
them, add a file to the examples/
directory and then call
mk_test
with a name for the test, the name of the file in the
example directory you are testing, and the expected number for it to
output. For failing tests, you can use mk_fail_test
to include
a string that you expect to occur in the error message.
You may still unit test your code, especially if you have tricky helper functions you define, but only integration test coverage will be graded this time. In future assignments, when we have multiple analyses and transformations, we will resume unit testing.
3.3 Running main and compiling to binary
main.rs
provides some convenient infrastructure for running
your compiler and interpreter from a terminal. To see the expected
arguments simply run cargo run
with no options. Note that to
distinguish the flags for your program from Cargo’s flags you need to
insert --
before your flags. For example, to run the
interpreter on a file example.adder
you can use
$ cargo run -- --interp example.adder
Alternatively, you can compile main.rs
to a binary using
cargo build --release
. Then the binary will be located at
./target/release/snake
.
4 Cross-platform Issues
I have tried to make the building and linking as cross-platform as
possible. I have tested that the programs work on Linux and Mac OS X,
and I think they should work on Windows as well. On any platform you
need to have nasm
and ar
installed. Ask on Piazza if
you have any issues installing these.
Here are the explicit commands the runner uses to build an executable
from an assembly code file compiled_code.s
and the rust stub
stub.rs
. First build an object file using nasm
,
passing the appropriate format flag for your platform.
$ nasm -felf64 compiled_code.s -o compiled_code.o # Linux
$ nasm -fmacho64 compiled_code.s -o compiled_code.o # Mac
$ nasm -fwin64 compiled_code.s -o compiled_code.o # Windows
Then build a static library file to link with the rust file
$ ar r libcompiled_code.a compiled_code.o # Linux & Mac
$ ar r compiled_code.lib compiled_code.o # Windows
Finally, compile the rust file, indicating where to look for the static library.
$ rustc stub.rs -L . -o stub.exe
This should make an executable that you can run.
$ ./stub.exe
I recommend you try this out with the sample file
section .text
global start_here
start_here:
mov RAX, 483
ret
start_here
on Mac anymore as I found an undocumented workaround to that issue.5 Rust Tips
To read more about pattern matching on structs like MemRef
and
enums with named arguments like Exp
see
this
chapter of the Rust book.
Sometimes you need to convert between different numeric types in Rust,
typically using the Into
and TryInto
traits. These
traits can be a bit tedious to use in my opinion because they often
need type annotations in weird places. To make things simple for you,
I’ve included a function usize_to_i32
in compile.rs
that tries to convert a usize
to an i32
, panicking if it
does not fit. I used this in the reference solution so you may find it
useful.
Finally, in checking binding-lists for duplicates you might want to use a data structure to keep track of the names. You can use a vector, like we use in the interpreter, or you may want to try out the HashSet module in the standard library.
6 List of Deliverables
Include (at least) the following files in your zip submission:
Cargo.toml
src/compile.rs
src/asm.rs
src/syntax.rs
src/interp.rs
src/runner.rs
src/lib.rs
any additional modules you saw fit to write
your compiler integration tests (
tests/examples.rs
)The test input programs (
examples/*.adder
files) you wrote
7 Grading Standards
For this assignment, you will be graded on
Whether your code implements the specification (functional correctness),
the clarity and cleanliness of your code, and
the comprehensiveness of your test coverage
8 Submission
Wait! Please read the assignment again and verify that you have not forgotten anything!
Please submit your homework on gradescope by the above deadline.