Assignment 5: Diamondback: Defining functions
Due: Tue 10/26 at 8:59pm
git clone
In this assignment, you’ll implement a compiler for a small language with functions declarations and function calls, that can communicate with functions written in Rust. You’ll also add some static well-formedness checks to the compiler.
As for the project acronym, so far our greatest minds have come up with
Designing an
Intel Architecture
MOstly dynamic
Nested-expression
(Diamondback supports recursion)
Boolean-tagged
Arithmetic-supporting
Compiler.
...K?
Click here for scary snake picture!
1 The Diamondback Language
As usual, we have concrete and abstract syntaxes, along with a specification of semantics.
1.1 Concrete Syntax
The major addition to Diamondback are function declarations. Our programs are now a sequence of zero or more function declarations, followed by a single main expression.
‹program› ‹decls› ‹expr› ‹expr› ‹binop-expr› IDENTIFIER NUMBER true false ! ‹binop-expr› ‹prim1› ( ‹expr› ) ‹expr› ‹prim2› ‹expr› IDENTIFIER ( ‹exprs› ) IDENTIFIER ( ) ( ‹expr› ) ‹prim1› add1sub1 printprintStack isboolisnum ‹prim2› +-* <><=>= == &&|| ‹decls› ‹decl› ‹decl› ‹decls› ‹decl› def IDENTIFIER ( ‹ids› ) : ‹expr› end def IDENTIFIER ( ) : ‹expr› end ‹ids› IDENTIFIER IDENTIFIER , ‹ids› ‹expr› let ‹bindings› in ‹expr› if ‹expr› : ‹expr› else: ‹expr› ‹binop-expr› ‹exprs› ‹expr› ‹expr› , ‹exprs› ‹bindings› IDENTIFIER = ‹expr› IDENTIFIER = ‹expr› , ‹bindings›
The other addition is function applications, which are written
IDENTIFIER(‹exprs›)
, for
example f(1, 2, 3)
. This is the syntax for a call to a function.
1.2 Abstract Syntax
In this assignment, we add two new AST types: Prog
for
representing programs and FunDecl
for representing function
declarations:
struct Prog<E, Ann> {
funs: Vec<FunDecl<E, Ann>>,
main: E,
ann: Ann,
}
struct FunDecl<E, Ann> {
name: String,
parameters: Vec<String>,
body: E,
ann: Ann,
}
These types are parameterized by a type of annotations Ann
as
well as a type of expressions E
. This way we can abstract over
programs containing arbitrary expressions versus and programs
containing only sequential expressions. We define "Surface language"
programs and function declarations versus "sequential" programs and
declarations as type aliases:
type SurfProg<Ann> = Prog<Exp<Ann>, Ann>;
type SurfFunDecl<Ann> = FunDecl<Exp<Ann>, Ann>;
type SeqProg<Ann> = Prog<SeqExp<Ann>, Ann>;
type SeqFunDecl<Ann> = FunDecl<SeqExp<Ann>, Ann>;
Additionally we add forms for function calls to expressions and sequential expressions.
1.3 Semantics
There are several distinguishing features of Diamondback. The first is function applications. A function application should give the answer we’d get if we followed the rules for substituting argument values for parameter names. So, for example:
def f(x, y):
x + y
f(1, 2)
should produce 3.
Logical operators &&
and ||
do not need to
short-circuit; they should have the same behavior as you gave them in Assignment 4.
Since our Diamondback functions only call other Diamondback functions, rather
than arbitrary functions in C, you can use a simpler calling convention than the
full x64 calling convention: you can simply push all the arguments onto the
stack and pop them all back off again. For primitives like
print
, obviously, you’ll need to follow the proper
x64
stack layout. (If you’d like to use the x64 calling convention everywhere,
you may do so, but it is more complicated for functions of more than 6 arguments. Test thoroughly.)
There are a number of new errors that can occur now that we have function declarations and calls. Your implementation should catch all of these cases statically; that is, at compile time before the program runs:
A function application with the wrong number of arguments should signal an
FunctionCalledWrongArity
errorIf you apply a function
f
but there is a local variable namedf
, you should signal aValueUsedAsFunction
A function application of a non-existent function
f
that is not a locally defined variable should signal anUndefinedFunction
error.An identifier
x
used in a value position with no corresponding let declaration and no function declaration should report anUnboundVariable
errorAn identifier
x
used in a value position with no corresponding let declaration but where there is a function declaration definingx
should report aFunctionUsedAsValue
errorA let binding with duplicate names should report a
DuplicateBinding
errorA function declaration with duplicate names in the argument list should report a
DuplicateArgName
errorIf there are multiple function definitions with the same name and arity, report a
DuplicateFunName
error. You can support overloading if you wish, but it might not be allowed in future assignmentsIf a numeric constant is too large, report an
Overflow
error
Again, these errors should stop the program from compiling, in your
check_prog
function, not happen at runtime.
1.4 Implementation
The file asm.rs
gives you all the assembly instructions you should
need, though you’re free to add your own new instructions if you want.
There are a few new pieces to the implementation:
A new
check_prog
function which should be an extension of your previouscheck
function to account for the additional errors above.The
Prog
structure, which contains both function declarations and the expression forstart_here
. You will probably want to generate an assembly structure that looks like:;; extern and global stuff section .text ... fun_decl1: ;; code for fun_decl1, including stack management fun_decl2: ;; code for fun_decl2, including stack management ... start_here: ;; main entrypoint, as before, with stack management ;; errors, as before internal_error_non_number: ...
(Hint: while the
Prog
construct is not quite uniform, since the body is treated separately from the function definitions, a relatively simple observation makes generating the output assembly of the entire program very uniform...)
1.5 Tail calls
Refine your compilation of function applications to support tail-calls. There are three levels of complexity here:
required Support arbitrary tail-calls where the caller and callee have the same number of arguments
optional, for bonus points Support arbtirary tail-calls where the callee has fewer arguments than the caller.
optional, for bonus points Support arbitrary tail-calls, where the callee can have more arguments than the caller.
To support the last one, you’ll need to be creative with the calling
convention. The only truly essential feature of a stack frame is knowing where
the saved RSP
, RBP
and return addresses must be, and all other
variables’ locations can be calculated from knowing those. In particular,
you’ll need to change the ordering of values in your stack frame, where
RBP
points between parameters and locals. Instead, both parameters and
locals will need to be above RBP
: change your compilation
of function calls —call
instruction (why?), so you’ll need to come up with something clever to save
the desired return address. Come see me if you try this, for a hint.
To detect that you’ve implemented tail calls correctly, you may want to attempt the following extra credit.
1.6 Extra credit: Printing stack traces
Implement a new prim1
operator, printStack
, that takes a
single argument and returns it, just like print
does. What makes
printStack
special is its output: it will print out a stack trace
of all the call frames from wherever it’s called, down to wherever
our_code_starts_here
began. For example, given the following program:
def fact(acc, n):
if n < 1: printStack(acc)
else: fact(acc * n, n - 1) + 0
# NOTE: This + 0 is to prevent tail-recursion in this example
end
fact(1, 3)
RSP: 0x00007ffc352b1bb0 ==> 0
RBP: 0x00007ffc352b1bd0 ==> 70360600251912
(difference: -4)
Requested return val: 0x000000000000000c ==> 6
Num args: 2
0x00007ffc352b1bb0: 000000000000000000 ==> old rbp
0x00007ffc352b1bb8: 000000000000000000 ==> 0
0x00007ffc352b1bc0: 000000000000000000 ==> 0
0x00007ffc352b1bc8: 0xffffffffffffffff ==> true
RBP 0x00007ffc352b1bd0: 0x00007ffc352b1c10 ==> old rbp
0x00007ffc352b1bd8: 0x0000000000401077 ==> saved ret
0x00007ffc352b1be0: 0x000000000000000c ==> 6
0x00007ffc352b1be8: 000000000000000000 ==> 0
0x00007ffc352b1bf0: 000000000000000000 ==> 0
0x00007ffc352b1bf8: 000000000000000000 ==> 0
0x00007ffc352b1c00: 0x000000000000000c ==> 6
0x00007ffc352b1c08: 0x7fffffffffffffff ==> false
RBP 0x00007ffc352b1c10: 0x00007ffc352b1c50 ==> old rbp
0x00007ffc352b1c18: 0x0000000000401077 ==> saved ret
0x00007ffc352b1c20: 0x000000000000000c ==> 6
0x00007ffc352b1c28: 0x0000000000000002 ==> 1
0x00007ffc352b1c30: 000000000000000000 ==> 0
0x00007ffc352b1c38: 0x0000000000000002 ==> 1
0x00007ffc352b1c40: 0x000000000000000c ==> 6
0x00007ffc352b1c48: 0x7fffffffffffffff ==> false
RBP 0x00007ffc352b1c50: 0x00007ffc352b1c90 ==> old rbp
0x00007ffc352b1c58: 0x0000000000401077 ==> saved ret
0x00007ffc352b1c60: 0x0000000000000006 ==> 3
0x00007ffc352b1c68: 0x0000000000000004 ==> 2
0x00007ffc352b1c70: 000000000000000000 ==> 0
0x00007ffc352b1c78: 0x0000000000000004 ==> 2
0x00007ffc352b1c80: 0x0000000000000006 ==> 3
0x00007ffc352b1c88: 0x7fffffffffffffff ==> false
RBP 0x00007ffc352b1c90: 0x00007ffc352b1cb0 ==> old rbp
0x00007ffc352b1c98: 0x00000000004010c1 ==> saved ret
0x00007ffc352b1ca0: 0x0000000000000002 ==> 1
0x00007ffc352b1ca8: 0x0000000000000006 ==> 3
BOT 0x00007ffc352b1cb0: 0x00007ffc352b1cf0 ==> old rbp
0x00007ffc352b1cb8: 0x0000000000400f37 ==> saved ret
6
The output shows:
The current values of
RSP
andRBP
at the moment of the theprintStack
call. The difference between them should be the size of the current stack frame.The requested return value is the argument given to
printStack
.Num args
shows the number of arguments passed to the current function – i.e., the two arguments given tofact
.The next bunch of segments show individual stack frames. The three columns show the memory address, the value at that address, and then the result of
print
ing that address as a Diamondback value, or identifying it as the savedRBP
and return address. Each line labelledRBP
indicates the beginning of a new stack frame: the value at each of those lines contains the address of the nextRBP
line.Looking at these stack frames, from bottom to top, you can see: the boolean result of
n < 1
, local valuesacc * n
andn - 1
, a slot for the return value of thefact(...)
call (which is still zero because the call hasn’t finished yet), then the argumentsn
andacc
for the next call.The final segment, marked
BOT
, is the stack frame of ourmain
expression.The last line of output is the final output of the entire program, and comes from the
println
inmain
.
Hint: To identify where to stop printing the stack, I used a
trick: declare a global mutable variable in Rust, static mut
BOTTOM: u64 = 0;
and extend start_here
to take an argument
fn start_here(bottom: *mut 64) -> SnakeVal
and pass a pointer
to BOTTOM: let output = unsafe { start_here(&mut BOTTOM) };
.
Then at the beginning of start_here
, initialize that value with
the current value of RSP
, and then you’re guaranteed to know
where your stack ends, dynamically.
Note: if you’ve properly implemented tail calls, then eliminating the
+ 0
from the program above, and rerunning it, should produce a
much shorter stack trace...
Note: If you implement this extra credit, provide a brief, written
explanation (in a comment in stub.rs
, near whatever new functions you
define) of why this must be implemented as a Prim1
and not as a
Call
expression.
1.7 Testing
Include integration test examples in examples/test.rs
. These
include a new testing macro mk_any_test!
that takes 2
arguments: the test name and the input file name and tests that
compiling and running the example works without error, but doesn’t
check for a particular output. This should be useful in testing your
printStack
function.
2 Running main
Running your own programs is the same as with Cobra, except you’ll
give them the .diamond
file extension. Examples from previous
assignments will still work since they will simply be interpreted as
programs with 0 function declarations.
3 List of Deliverables
your
compile.rs
andasm.rs
the other src/*.rs files in the starter code
any additional modules you saw fit to write
your
runtime/stub.rs
the Cargo.toml
integration tests (
tests/examples.rs
)your test input programs (
examples/*.cobra
files)
Again, please ensure the makefile builds your code properly. The black-box tests will give you an automatic 0 if they cannot compile your code!
Please ensure the your code builds properly. The autograder will give you a 0 on that portion of the graade if it cannot compile your code.
4 Grading Standards
For this assignment, you will be graded on
Whether your code implements the specification (functional correctness),
the clarity and cleanliness of your code, and
the comprehensiveness of your test coverage
5 Submission
Wait! Please read the assignment again and verify that you have not forgotten anything!
Please submit your homework to gradescope by the above deadline.