8.0

## Assignment 4: Cobra: Multiple types of values

#### Due: Tue 10/12 at 9:00pm

git clone

In this compiler, you’ll deal with COded Binary RepresntAtions of values. Click here for scary snake picture!

### 1The Cobra Language

#### 1.1Concrete Syntax

The concrete syntax of Cobra is very similar to Boa, with a few new additions.

‹expr›: ... | true | false | ‹expr› < ‹expr› | ‹expr› > ‹expr› | ‹expr› <= ‹expr› | ‹expr› >= ‹expr› | ‹expr› == ‹expr› | ‹expr› && ‹expr› | ‹expr› || ‹expr› | ! ‹expr› | isbool ( ‹expr› ) | isnum ( ‹expr› ) | print ( ‹expr› )

#### 1.2Abstract Syntax

The abstract syntax is very similar to Boa, also:

pub enum Prim1 {
...
Not,
Print,
IsBool,
IsNum,
}
pub enum Prim2 {
...
And,
Or,
Lt,
Gt,
Le,
Ge,
Eq,
}

pub enum Exp<Ann> {
...
Bool(bool, Ann),
}

#### 1.3Semantics

The semantics of booleans are straightforward. The semantics of Exp::If changes slightly: its condition must evaluate to a boolean, and it branches on the truth or falsehood of that value, rather than whether it’s nonzero.

With the addition of two types to the language, there are two main changes that ripple through the implementation:

• The representation of values

• The possibility of errors

There is one other major addition, which is the print primitive, discussed more below.

The representation of values requires a definition. We’ll use the following representations for the Cobra runtime:

• true will be represented as the constant 0xFFFFFFFFFFFFFFFF.

• false will be represented as the constant 0x7FFFFFFFFFFFFFFF.

• numbers will be represented with a zero in the rightmost bit, by shifting them left 1 bit. So, for example, 2 is represented as 0x0000000000000004.

You should raise errors at runtime in the following cases:

• -, +, *, add1 and sub1 should raise an error (by printing it out) with the substring "arithmetic expected a number" if the operation’s argument(s) are not numbers. You can print more information than this if you like, but the message must have at least this substring.

• <, <=, > and >= should raise an error with the substring "comparison expected a number" if the arguments are not both numbers.

• +, -, *, add1 and sub1 should raise an error with the substring "overflow" if the result overflows, and falls outside the range representable in 63 bits. The jo instruction will be useful: it jumps if the last instruction overflowed.

• if should raise an error with the substring "if expected a boolean" if the conditional value is not a boolean.

• and and or should raise an error with the substring "logic expected a boolean" if the arguments are not both booleans. Implementation of short-circuiting behavior is optional.

• not should raise an error with the substring "logic expected a boolean" if its argument is not a boolean.

These error messages should be printed on standard error. The other operators never fail, and so never produce errors.

The == should work polymorphically, regardless of the types of its inputs. If we ask if a boolean is equal to a number, the result should be false, not a runtime error. A value is only == to itself.

We add two new primitive operations, isbool and isnum. These two operations have an effective type of Any -> Bool, and will return true if the argument they receive is indeed a boolean or number, respectively.

The last required primitive operation is print, which prints its single argument to the command line (with a newline), and then returns it. The print_snake_val function in stub.rs explicitly returns its argument; you will need to retrieve and use that value in your compiled output.

### 2Examples

1. The expression

let x = 1 in
let y = print(x + 1) in
print(y + 2)

will output

2
4
4

The first 2 comes from the first print expression. The first 4 comes from the second print expression. The final line prints the answer of the program as usual, so there’s an “extra” 4.

2. The expression

if 54: true else: false

prints (on standard error) something like:

Error: if expected a boolean, got 54

### 3Implementation strategies

#### 3.1Rendering errors

To display error messages on standard error, you’ll need to use a call something like:

eprintln!("Error: if expected a boolean");

I have included a function snake_error() in stub.rs. You will need to add inputs of your design to this function to get your desired error messages. You can follow what we did in class or design it however you like.

#### 3.2Memory Layout and Calling Rust Functions

In order to set up the stack properly to call Rust functions, like print_snake_val and your error functions, it’s necessary to make a few changes to what we had in Boa.

• Allocating stack space ahead of time: At the start of our generated code (which we now recognize is a function body participating in a C runtime stack), we need to make sure we make enough stack space for all variables we create, and reserve that space ahead of time. To do this, we move RSP to point to a location that is N words away (so N * 8 bytes for us), where N is the greatest number of variables we need at once. This is actually tricky to compute to be fully optimal (teaser for later in the semester: by "tricky" I mean NP-hard), but it’s easy to get an OK heuristic — we can compute the maximum depth of nested definitions.

You need to add instructions to compiled output in order to make sure the correct space is allocated on the stack by subtracting the right amount from RSP. You may choose to implement this via a simple increment of RSP, or you may choose to explicitly push an appropriate number of zeros onto the stack, to clear out any uninitialized values. (The former is more efficient; the latter might be helpful for debugging.)

To figure out how much space, you need to implement the space_needed function. There are many ways to implement this, but you need to make sure that you have enough space to fit all of your local variables and that you keep the stack aligned.

When you make a function call, you must ensure that before the call instruction the stack pointer rsp is 16-byte aligned, i.e., divisible by 16. As discussed in class, you can achieve this by allocating one extra slot on the stack if you would otherwise be misaligned. This interacts with how many callee-save registers you save in your preamble: if you save an odd number of callee-save registers, you need to allocate an even number of stack slots for local variables, and vice-versa if you save an even number of callee-save registers, you need to allocate an odd number of stack slots.

• Using the Base Pointer: In addition, this means that all variable references need to happen from RBP rather than from RSP. This is because RSP can change while we are pushing arguments onto the stack for other function calls, so RBP is the place we should trust for consistent offsets.

• Participating in the stack: As a callee using the C calling convention, our code has some responsibilities. First, we need to store the callee-save registers that we use in the code upon entry. This includes at least the old base pointer rbp, but will likely also include r15 if you use it and any of rbx, r12, r13, r14 if you use them as well. Next, we update the base pointer to hold the current top of the stack (which includes the return pointer into main, for example). This is why the typical first lines of most C functions are:

push RBP
;; push other callee-save registers
mov RBP, RSP

Similarly, when we’re done with the function, we need to restore the stack pointer to its old location, and put the old base pointer value back. This is why the last lines before a ret in a C function are often:

mov RSP, RBP
;; pop other callee-save registers
pop RBP

If you write any functions in stub.rs that you need to be available in your assembly, you need to declare them in the assembly via:

;; In your generated assembly
extern <exported name of function>

#### 3.3New Assembly Constructs

• Sized: You may run into errors that report that the size of an operation is ambiguous. This could happen if you write, for example:

cmp [RBP-8], 0

because the assembler doesn’t know if the program should compare a four-byte zero, a one-byte zero, or something in between into memory starting at [RBP-8]. To solve this, you can supply a size:

cmp QWORD [RBP-8], 0

The QWORD keyword tells the assembler to use the "quad word" size for the memory reference. In our generated assembly code we have always worked at the full 64-bit word level so it is fine to just change all memory references to use the QWORD keyword. If you do not want every memory reference to use the keyword feel free to change types in asm.rs to support more selective use.

• Unsigned vs Signed: Assembly code supports 64-bit unsigned and signed constants and certain values like the representation of true, false and bit patterns cannot be represented as i64 values. So the Arg64/Arg32/Reg32 types have replaced the Imm with two cases for Signed and Unsigned constants. I recommend displaying Unsigned constants in hexadecimal since you will probably only be using them for bit-patterns. To format a number n in hexadecimal, you can use format!("0x{:016x}", n) for 64-bit numbers and format!("0x{:08x}", n) for 32-bit.

• push, pop: These two instructions manage values on the stack. push adds a value at the current location of rsp, and decrements rsp to point past the added value (remember, the stack grows toward numerically-smaller addresses, even though we draw it as growing upward in diagrams). pop increments rsp and moves the value at the location rsp was pointing to into the provided arg.

• A call does two things:

• Pushes the next code location onto the stack (just like a push), which becomes the return pointer

Note that call does not affect RBP, which the program must maintain on its own.

• shr, shl, sar: Bit shifting operations. sar (shift-arithmetic-right) preserves the sign bit of the value being shifted, while shr (shift-right) simply fills in a zero bit.

• and, or, xor: Bitwise logical operations.

• test: Performs a bitwise-and of the two arguments, and records whether the result was zero, signed, or overflowed. This can be used in tandem with jumps like jo or jz.

• jo, jno: Jump to the provided label if the last arithmetic operation did/did not overflow.

• Comment: these allow you to add a comment on a line by itself, or in-line after an instruction. These might be helpful for readability of your generated output, while you’re debugging.

#### 3.4Debugging

If you are having trouble producing correct code, I highly recommend

• Inspecting your generated assembly code

• Print out intermediate expressions using println!("{:?}", e)

• Finding minimal examples that produce incorrect behavior

#### 3.5Testing

Only integration tests, i.e., tests/examples.rs will be graded for this assignment.

### 4Recommended TODO List

Here’s an order in which you could consider tackling the implementation:

1. Fix the Num case and implement the Bool case of the compiler to correctly encode the values.

2. Extend check (renamed from check_scope) to use the Overflow error to indicate integer constants that are too big to fit into 63-bit signed integers.

3. Implement the Prim2 operations without checking runtime type tags. Test your arithmetic and logical operations with programs where the tag checks would succeed.

4. Similarly, implement the Prim1 operations besides print without checking runtime type tags.

5. Add the push rbp ... mov rbp, rsp preamble and corresponding epilogue to your compiled code.

6. Update your variable generation code to use locations releative to rbp rather than rsp. Re-run your existing tests to make sure they are working!.

7. Implement space_needed and correctly increment the stack pointer upon entry into your function.

8. Figure out how to test for errors and call the snake_error function when an error is detected. Test as you go. Be aware that if the function call segfaults, it may be because you need to refine your space_needed function.

9. Complete the if case and test as you go.

10. Implement print. Be aware that if the call doesn’t work, it may be because of space_needed again. Test as you go; be aware that you should test interesting sequences of print expressions and let-bindings to make sure your stack integrity is good before and after calls.

### 5Running main

Running your own programs is the same as with Boa, except you’ll give them the .cobra file extension.

### 6List of Deliverables

• your compile.rs and asm.rs

• the other src/*.rs files in the starter code

• any additional modules you saw fit to write

• your runtime/stub.rs

• the Cargo.toml

• integration tests (tests/examples.rs)

• your test input programs (examples/*.cobra files)

For this assignment, you will be graded on

• Whether your code implements the specification (functional correctness),

• the clarity and cleanliness of your code, and

• the comprehensiveness of your test coverage

### 8Submission

Wait! Please read the assignment again and verify that you have not forgotten anything!