Assignment 4: Diamondback: Defining functions
Due: Sun 10/29 at 11:59pm
git clone
In this assignment, you’ll implement a compiler for a small language with local functions declarations and function calls, that can communicate with functions written in Rust. You’ll also add some static well-formedness checks to the compiler.
As for the project acronym, so far our greatest minds have come up with
Designing an
Intel Architecture
MOstly dynamic
Nested-expression
(Diamondback supports recursion)
Boolean-tagged
Arithmetic-supporting
Compiler.
...K?
Click here for scary snake picture!
1 The Diamondback Language
As usual, we have concrete and abstract syntaxes, along with a specification of semantics.
1.1 Concrete Syntax
The major addition to Diamondback are local function
declarations. We have a new binding form, where we can
def
a series of mutually recursive functions in any
expression.
‹expr› let ‹bindings› in ‹expr› if ‹expr› : ‹expr› else: ‹expr› ‹decls› in ‹expr› ‹binop-expr› ‹binop-expr› IDENTIFIER NUMBER true false ! ‹binop-expr› ‹prim1› ( ‹expr› ) ‹expr› ‹prim2› ‹expr› IDENTIFIER ( ‹exprs› ) IDENTIFIER ( ) ( ‹expr› ) ‹prim1› add1sub1 printisboolisnum ‹prim2› +-* <><=>= == &&|| ‹decls› ‹decls› and ‹decl› ‹decl› ‹decl› def IDENTIFIER ( ‹ids› ) : ‹expr› def IDENTIFIER ( ) : ‹expr› ‹ids› IDENTIFIER IDENTIFIER , ‹ids› ‹exprs› ‹expr› ‹expr› , ‹exprs› ‹bindings› IDENTIFIER = ‹expr› IDENTIFIER = ‹expr› , ‹bindings›
The other addition is function applications, which are written
IDENTIFIER(‹exprs›)
, for
example f(1, 2, 3)
. This is the syntax for a call to a function.
1.2 Abstract Syntax
In this assignment, we add four new cases to our Exp
AST. Two of them correspond directly to concrete syntax features:
one for mutually recursive sequences of local function definitions
one for function calls
The other two are never produced by the parser and instead will be produced by your own intermediate passes like lambda lifting:
one for tail calls to local function definitions
one for (tail and non-tail) calls to global function definitions
Since we will also need to use function definitions in our
sequentialized code, we make a type FunDecl
that is
generic in the underlying type of expressions.
pub struct FunDecl<E, Ann> {
pub name: String,
pub parameters: Vec<String>,
pub body: E,
pub ann: Ann,
}
pub enum Exp<Ann> {
... // previous cases
FunDefs {
decls: Vec<FunDecl<Exp<Ann>, Ann>>,
body: Box<Exp<Ann>>,
ann: Ann,
},
Call(String, Vec<Exp<Ann>>, Ann),
InternalTailCall(String, Vec<Exp<Ann>>, Ann),
ExternalCall {
fun_name: String,
args: Vec<Exp<Ann>>,
is_tail: bool,
ann: Ann,
},
}
You will implement a lambda lifting pass that will separate the input program into two pieces:
a sequence of global function definitions
and a single main expression to be run.
Additionally your pass will change Exp::Call
expressions into
Exp::InternalTailCall
or Exp::ExternalCall
as
appropriate.
So our sequential form has calls, in which
arguments are required to be immediate, but no local function
definitions. Instead, we have a new type of sequential form programs
SeqProg
that packages the functions and main expression into a
struct.
pub enum SeqExp<Ann> {
... // same as before
FunDefs {
decls: Vec<FunDecl<SeqExp<Ann>, Ann>>,
body: Box<SeqExp<Ann>>,
ann: Ann,
},
InternalTailCall(String, Vec<ImmExp>, Ann),
ExternalCall {
fun_name: String,
args: Vec<ImmExp<Ann>>,
is_tail: bool,
ann: Ann,
}
}
pub struct SeqProg<Ann> {
pub funs: Vec<FunDecl<SeqExp<Ann>, Ann>>,
pub main: SeqExp<Ann>,
pub ann: Ann,
}
1.3 Semantics
There are several distinguishing features of Diamondback. The first is function applications. A function application should give the answer we’d get if we followed the rules for substituting argument values for parameter names. So, for example:
def f(x, y):
x + y
in
f(1, 2)
should produce 3.
Logical operators &&
and ||
do not need to
short-circuit; they should have the same behavior as you gave them in Assignment 3.
There are a number of new errors that can occur now that we have function declarations and calls. Your implementation should catch all of these cases statically; that is, at compile time before the program runs:
A function application with the wrong number of arguments should signal an
FunctionCalledWrongArity
errorIf you apply a function
f
but there is a local variable namedf
, you should signal aValueUsedAsFunction
A function application of a non-existent function
f
that is not a locally defined variable should signal anUndefinedFunction
error.An identifier
x
used in a value position with no corresponding let declaration and no function declaration should report anUnboundVariable
errorAn identifier
x
used in a value position with no corresponding let declaration but where there is a function declaration definingx
should report aFunctionUsedAsValue
errorA let binding with duplicate names should report a
DuplicateBinding
errorA function declaration with duplicate names in the argument list should report a
DuplicateArgName
errorIf there are mutually recursive function definitions with the same name and arity, report a
DuplicateFunName
error. You can support overloading if you wish, but it probably won’t be allowed in future assignmentsIf a numeric constant is too large, report an
Overflow
error
Again, these errors should stop the program from compiling, in your
check_prog
function, not happen at runtime.
1.4 Middle-End
Once the code passes the check_prog
function, we need to do a
few transformations before we’re ready for code generation.
First, rename all functions and variables with unique names, to simplify later passes
Then, determine which functions can stay as local function definitions and which need to be lifted to global definitions.
Then, lift the functions identified in the previous analysis to the top-level, and translate
Exp::Call
nodes toExp::InternalTailCall
orExp::ExternalCall
nodesFinally, sequentialize the main expression and the bodies of the global function definitions
To help you get started, we’ve given you some stubs and code for
plugging these passes together. As usual, you do not have to follow
this exact code, you are free to change the types of any function you
use internally, just don’t change the types of anything that is
pub
.
1.4.1 Which functions to lambda lift?
Not all functions need to be lifted to the top level. For instance in the following program, every call is a tail call:
let x = 5
def loop(i,acc):
if i == 0:
acc
else:
loop(i - 1, acc + x)
in
loop(x,0)
Therefore for this program, lambda lifting will produce no global
function definitions, and leave the definition of loop
to be local. All that needs to be done is to make explicit that the
calls are all implemented as internal tail calls:
let x = 5
def loop(i,acc):
if i == 0:
acc
else:
internal_tail_call(loop; [i - 1, acc + x])
in
internal_tail_call(loop; [x,0])
On the other hand, in the following function, g
is
called twice, once in a non-tail position and once in tail position:
let x = 5 in
def g(y):
y * x
in
let a = g(7) + 4 in
g(a)
Since g
is called at least once in a non-tail position we lift it
to a global function definition. Your lambda lifting function should
lift this to a top level function definition where the local variable
x
is "captured" by g
and added as an extra
argument.
def g(x,y):
y * x
in
And the main expression becomes
let a = external_call{ fun_name: g, args: [x, 7], is_tail: false } + 4 in
external_call { fun_name: g, args: [x, a], is_tail: true }
But note that in addition to capturing variables, function definitions can also capture other function definitions:
let x = 5 in
def f(z):
z * x
in
def g(y):
f(y + 1)
in
g(7) + 4
There are several valid ways to lambda lift this function. The easiest
to implement correctly is to lift both f
and
g
to globals:
def f(x,z):
z * x
and
def g(x,y):
f(y + 1)
in
This method generalizes to a sub-obtimal but simple strategy: lift any function that is ever called in a non-tail position, and lift any local function definition that is in scope of another that is lifted. A refinement of this would be to only lift functions that are live in functions that are lifted, but this would require a more sophisticated analysis.
A cleverer compilation would copy the definition of f
into a local definition in the body of g
:
def g(x,y):
def f(z):
z * x
in
f(y + 1)
in
If you try to implement this cleverer transformation, be careful that it doesn’t lead to an exponential blowup in the number of local function definitions in the final program.
1.5 Backend
The file asm.rs
gives you all the assembly instructions you should
need, though you’re free to add your own new instructions if you want.
There are a few new pieces to the implementation:
For calling primitives like print
, obviously, you’ll
need to continue to follow the
System
V AMD ABI.
There are three different types of internal function calls, which are implemented slightly differently:
Local function calls: Implemented as described in Lecture 8: Local Function Definitions and Tail Calls. In particular note that when implementing a local tail call
f(a,b,c)
, the placement of the argumentx
will depend on how many variables are in scope wheref
is defined. See the end of those notes for an example.Global tail calls: Similar to local function calls, except the first argument will always be at
RSP - 8
Global non-tail calls: As described in Lecture 9: Global Function Definitions and Non-tail Calls.
Stack alignment is trickier this time because we are calling our own functions as well and so incorrect alignment might cancel itself out to make some tests pass. You might consider adding dynamic alignment checks to help track down mis-alignment bugs.
2 Running main
Running your own programs is the same as with Cobra, except you’ll
give them the .diamond
file extension.
3 List of Deliverables
your
compile.rs
andasm.rs
the other src/*.rs files in the starter code
any additional modules you saw fit to write
your
runtime/stub.rs
the Cargo.toml
integration tests (
tests/examples.rs
)your test input programs (
examples/*.cobra
files)
Please ensure the your code builds properly. The autograder will give you a 0 on that portion of the grade if it cannot compile your code.
4 Grading Standards
For this assignment, you will be graded on
Whether your code implements the specification (functional correctness),
The autograder will again include tests that are hidden until after the deadline that are worth 10% of your score.
5 Submission
Wait! Please read the assignment again and verify that you have not forgotten anything!
Please submit your homework to gradescope by the above deadline.