8.2

## Assignment 5: Diamondback: Defining functions

#### Due: Tue 10/26 at 8:59pm

git clone

In this assignment, you’ll implement a compiler for a small language with functions declarations and function calls, that can communicate with functions written in Rust. You’ll also add some static well-formedness checks to the compiler.

As for the project acronym, so far our greatest minds have come up with

• Designing an

• Intel Architecture

• MOstly dynamic

• Nested-expression

• (Diamondback supports recursion)

• Boolean-tagged

• Arithmetic-supporting

• Compiler.

• ...K?

### 1The Diamondback Language

As usual, we have concrete and abstract syntaxes, along with a specification of semantics.

#### 1.1Concrete Syntax

The major addition to Diamondback are function declarations. Our programs are now a sequence of zero or more function declarations, followed by a single main expression.

‹program›: | ‹decls› ‹expr› | ‹expr› ‹binop-expr›: | IDENTIFIER | NUMBER | true | false | ! ‹binop-expr› | ‹prim1› ( ‹expr› ) | ‹expr› ‹prim2› ‹expr› | IDENTIFIER ( ‹exprs› ) | IDENTIFIER ( ) | ( ‹expr› ) ‹prim1›: | add1 | sub1 | print | printStack | isbool | isnum ‹prim2›: | + | - | * | < | > | <= | >= | == | && | || ‹decls›: | ‹decl› | ‹decl› ‹decls› ‹decl›: | def IDENTIFIER ( ‹ids› ) : ‹expr› end | def IDENTIFIER ( ) : ‹expr› end ‹ids›: | IDENTIFIER | IDENTIFIER , ‹ids› ‹expr›: | let ‹bindings› in ‹expr› | if ‹expr› : ‹expr› else: ‹expr› | ‹binop-expr› ‹exprs›: | ‹expr› | ‹expr› , ‹exprs› ‹bindings›: | IDENTIFIER = ‹expr› | IDENTIFIER = ‹expr› , ‹bindings›

The other addition is function applications, which are written IDENTIFIER(‹exprs›), for example f(1, 2, 3). This is the syntax for a call to a function.

#### 1.2Abstract Syntax

In this assignment, we add two new AST types: Prog for representing programs and FunDecl for representing function declarations:

struct Prog<E, Ann> {
funs: Vec<FunDecl<E, Ann>>,
main: E,
ann: Ann,
}

struct FunDecl<E, Ann> {
name: String,
parameters: Vec<String>,
body: E,
ann: Ann,
}

These types are parameterized by a type of annotations Ann as well as a type of expressions E. This way we can abstract over programs containing arbitrary expressions versus and programs containing only sequential expressions. We define "Surface language" programs and function declarations versus "sequential" programs and declarations as type aliases:

type SurfProg<Ann> = Prog<Exp<Ann>, Ann>;
type SurfFunDecl<Ann> = FunDecl<Exp<Ann>, Ann>;

type SeqProg<Ann> = Prog<SeqExp<Ann>, Ann>;
type SeqFunDecl<Ann> = FunDecl<SeqExp<Ann>, Ann>;

Additionally we add forms for function calls to expressions and sequential expressions.

#### 1.3Semantics

There are several distinguishing features of Diamondback. The first is function applications. A function application should give the answer we’d get if we followed the rules for substituting argument values for parameter names. So, for example:

def f(x, y):
x + y

f(1, 2)

should produce 3.

Logical operators && and || do not need to short-circuit; they should have the same behavior as you gave them in Assignment 4.

Since our Diamondback functions only call other Diamondback functions, rather than arbitrary functions in C, you can use a simpler calling convention than the full x64 calling convention: you can simply push all the arguments onto the stack and pop them all back off again. For primitives like print, obviously, you’ll need to follow the proper x64 stack layout. (If you’d like to use the x64 calling convention everywhere, you may do so, but it is more complicated for functions of more than 6 arguments. Test thoroughly.)

There are a number of new errors that can occur now that we have function declarations and calls. Your implementation should catch all of these cases statically; that is, at compile time before the program runs:

• A function application with the wrong number of arguments should signal an FunctionCalledWrongArity error

• If you apply a function f but there is a local variable named f, you should signal a ValueUsedAsFunction

• A function application of a non-existent function f that is not a locally defined variable should signal an UndefinedFunction error.

• An identifier x used in a value position with no corresponding let declaration and no function declaration should report an UnboundVariable error

• An identifier x used in a value position with no corresponding let declaration but where there is a function declaration defining x should report a FunctionUsedAsValue error

• A let binding with duplicate names should report a DuplicateBinding error

• A function declaration with duplicate names in the argument list should report a DuplicateArgName error

• If there are multiple function definitions with the same name and arity, report a DuplicateFunName error. You can support overloading if you wish, but it might not be allowed in future assignments

• If a numeric constant is too large, report an Overflow error

Again, these errors should stop the program from compiling, in your check_prog function, not happen at runtime.

#### 1.4Implementation

The file asm.rs gives you all the assembly instructions you should need, though you’re free to add your own new instructions if you want.

There are a few new pieces to the implementation:

• A new check_prog function which should be an extension of your previous check function to account for the additional errors above.

• The Prog structure, which contains both function declarations and the expression for start_here. You will probably want to generate an assembly structure that looks like:

  ;; extern and global stuff
section .text
...
fun_decl1:
;; code for fun_decl1, including stack management
fun_decl2:
;; code for fun_decl2, including stack management
...
start_here:
;; main entrypoint, as before, with stack management

;; errors, as before
internal_error_non_number:
...

(Hint: while the Prog construct is not quite uniform, since the body is treated separately from the function definitions, a relatively simple observation makes generating the output assembly of the entire program very uniform...)

#### 1.5Tail calls

Refine your compilation of function applications to support tail-calls. There are three levels of complexity here:

• required Support arbitrary tail-calls where the caller and callee have the same number of arguments

• optional, for bonus points Support arbtirary tail-calls where the callee has fewer arguments than the caller.

• optional, for bonus points Support arbitrary tail-calls, where the callee can have more arguments than the caller.

To support the last one, you’ll need to be creative with the calling convention. The only truly essential feature of a stack frame is knowing where the saved RSP, RBP and return addresses must be, and all other variables’ locations can be calculated from knowing those. In particular, you’ll need to change the ordering of values in your stack frame, where RBP points between parameters and locals. Instead, both parameters and locals will need to be above RBP: change your compilation of function calls — and anything else that is affected! — to support this new convention. This change will prevent you from using the call instruction (why?), so you’ll need to come up with something clever to save the desired return address. Come see me if you try this, for a hint.

To detect that you’ve implemented tail calls correctly, you may want to attempt the following extra credit.

#### 1.6Extra credit: Printing stack traces

Implement a new prim1 operator, printStack, that takes a single argument and returns it, just like print does. What makes printStack special is its output: it will print out a stack trace of all the call frames from wherever it’s called, down to wherever our_code_starts_here began. For example, given the following program:

def fact(acc, n):
if n < 1: printStack(acc)
else: fact(acc * n, n - 1) + 0
# NOTE: This + 0 is to prevent tail-recursion in this example
end

fact(1, 3)
Here is a potential stack trace:

RSP: 0x00007ffc352b1bb0	==>  0
RBP: 0x00007ffc352b1bd0	==>  70360600251912
(difference: -4)
Requested return val: 0x000000000000000c	==> 6
Num args: 2
0x00007ffc352b1bb0: 000000000000000000	==>  old rbp
0x00007ffc352b1bb8: 000000000000000000	==>  0
0x00007ffc352b1bc0: 000000000000000000	==>  0
0x00007ffc352b1bc8: 0xffffffffffffffff	==>  true
RBP 0x00007ffc352b1bd0: 0x00007ffc352b1c10	==>  old rbp
0x00007ffc352b1bd8: 0x0000000000401077	==>  saved ret
0x00007ffc352b1be0: 0x000000000000000c	==>  6
0x00007ffc352b1be8: 000000000000000000	==>  0
0x00007ffc352b1bf0: 000000000000000000	==>  0
0x00007ffc352b1bf8: 000000000000000000	==>  0
0x00007ffc352b1c00: 0x000000000000000c	==>  6
0x00007ffc352b1c08: 0x7fffffffffffffff	==>  false
RBP 0x00007ffc352b1c10: 0x00007ffc352b1c50	==>  old rbp
0x00007ffc352b1c18: 0x0000000000401077	==>  saved ret
0x00007ffc352b1c20: 0x000000000000000c	==>  6
0x00007ffc352b1c28: 0x0000000000000002	==>  1
0x00007ffc352b1c30: 000000000000000000	==>  0
0x00007ffc352b1c38: 0x0000000000000002	==>  1
0x00007ffc352b1c40: 0x000000000000000c	==>  6
0x00007ffc352b1c48: 0x7fffffffffffffff	==>  false
RBP 0x00007ffc352b1c50: 0x00007ffc352b1c90	==>  old rbp
0x00007ffc352b1c58: 0x0000000000401077	==>  saved ret
0x00007ffc352b1c60: 0x0000000000000006	==>  3
0x00007ffc352b1c68: 0x0000000000000004	==>  2
0x00007ffc352b1c70: 000000000000000000	==>  0
0x00007ffc352b1c78: 0x0000000000000004	==>  2
0x00007ffc352b1c80: 0x0000000000000006	==>  3
0x00007ffc352b1c88: 0x7fffffffffffffff	==>  false
RBP 0x00007ffc352b1c90: 0x00007ffc352b1cb0	==>  old rbp
0x00007ffc352b1c98: 0x00000000004010c1	==>  saved ret
0x00007ffc352b1ca0: 0x0000000000000002	==>  1
0x00007ffc352b1ca8: 0x0000000000000006	==>  3
BOT 0x00007ffc352b1cb0: 0x00007ffc352b1cf0	==>  old rbp
0x00007ffc352b1cb8: 0x0000000000400f37	==>  saved ret
6

The output shows:

• The current values of RSP and RBP at the moment of the the printStack call. The difference between them should be the size of the current stack frame.

• The requested return value is the argument given to printStack.

• Num args shows the number of arguments passed to the current function – i.e., the two arguments given to fact.

• The next bunch of segments show individual stack frames. The three columns show the memory address, the value at that address, and then the result of printing that address as a Diamondback value, or identifying it as the saved RBP and return address. Each line labelled RBP indicates the beginning of a new stack frame: the value at each of those lines contains the address of the next RBP line.

Looking at these stack frames, from bottom to top, you can see: the boolean result of n < 1, local values acc * n and n - 1, a slot for the return value of the fact(...) call (which is still zero because the call hasn’t finished yet), then the arguments n and acc for the next call.

• The final segment, marked BOT, is the stack frame of our main expression.

• The last line of output is the final output of the entire program, and comes from the println in main.

Hint: To identify where to stop printing the stack, I used a trick: declare a global mutable variable in Rust, static mut BOTTOM: u64 = 0; and extend start_here to take an argument fn start_here(bottom: *mut 64) -> SnakeVal and pass a pointer to BOTTOM: let output = unsafe { start_here(&mut BOTTOM) }; .

Then at the beginning of start_here, initialize that value with the current value of RSP, and then you’re guaranteed to know where your stack ends, dynamically.

Note: if you’ve properly implemented tail calls, then eliminating the + 0 from the program above, and rerunning it, should produce a much shorter stack trace...

Note: If you implement this extra credit, provide a brief, written explanation (in a comment in stub.rs, near whatever new functions you define) of why this must be implemented as a Prim1 and not as a Call expression.

#### 1.7Testing

Include integration test examples in examples/test.rs. These include a new testing macro mk_any_test! that takes 2 arguments: the test name and the input file name and tests that compiling and running the example works without error, but doesn’t check for a particular output. This should be useful in testing your printStack function.

### 2Running main

Running your own programs is the same as with Cobra, except you’ll give them the .diamond file extension. Examples from previous assignments will still work since they will simply be interpreted as programs with 0 function declarations.

### 3List of Deliverables

• your compile.rs and asm.rs

• the other src/*.rs files in the starter code

• any additional modules you saw fit to write

• your runtime/stub.rs

• the Cargo.toml

• integration tests (tests/examples.rs)

• your test input programs (examples/*.cobra files)

Again, please ensure the makefile builds your code properly. The black-box tests will give you an automatic 0 if they cannot compile your code!

For this assignment, you will be graded on

• Whether your code implements the specification (functional correctness),

• the clarity and cleanliness of your code, and

• the comprehensiveness of your test coverage

### 5Submission

Wait! Please read the assignment again and verify that you have not forgotten anything!