8.7

## Assignment 3: Cobra: Multiple types of values

#### Due: Fri 10/14 at 5:00pm

git clone

In this compiler, you’ll deal with COded Binary RepresntAtions of values. Click here for scary snake picture!

### 1The Cobra Language

#### 1.1Concrete Syntax

The concrete syntax of Cobra is very similar to Boa, with a few new additions.

‹expr›: ... | true | false | ‹expr› < ‹expr› | ‹expr› > ‹expr› | ‹expr› <= ‹expr› | ‹expr› >= ‹expr› | ‹expr› == ‹expr› | ‹expr› && ‹expr› | ‹expr› || ‹expr› | ! ‹expr› | isbool ( ‹expr› ) | isnum ( ‹expr› ) | print ( ‹expr› )

#### 1.2Abstract Syntax

The abstract syntax is very similar to Boa, also:

pub enum Prim1 {
...
Not,
Print,
IsBool,
IsNum,
}
pub enum Prim2 {
...
And,
Or,
Lt,
Gt,
Le,
Ge,
Eq,
}

pub enum Exp<Ann> {
...
Bool(bool, Ann),
}

#### 1.3Semantics

The semantics of booleans are straightforward. The semantics of Exp::If changes slightly: its condition must evaluate to a boolean, and it branches on the truth or falsehood of that value, rather than whether it’s nonzero.

With the addition of two types to the language, there are two main changes that ripple through the implementation:

• The representation of values

• The possibility of errors

There is one other major addition, which is the print primitive, discussed more below.

The representation of values requires a definition. We’ll use the following representations for the Cobra runtime:

• true will be represented as the constant 0xFFFFFFFFFFFFFFFF.

• false will be represented as the constant 0x7FFFFFFFFFFFFFFF.

• numbers will be represented with a zero in the rightmost bit, by shifting them left 1 bit. So, for example, 2 is represented as 0x0000000000000004.

You should raise errors at runtime in the following cases:

• -, +, *, add1 and sub1 should raise an error (by printing it out) with the substring "arithmetic expected a number" if the operation’s argument(s) are not numbers. You can print more information than this if you like, but the message must have at least this substring.

• <, <=, > and >= should raise an error with the substring "comparison expected a number" if the arguments are not both numbers.

• +, -, *, add1 and sub1 should raise an error with the substring "overflow" if the result overflows, and falls outside the range representable in 63 bits. The jo instruction will be useful: it jumps if the last instruction overflowed.

• if should raise an error with the substring "if expected a boolean" if the conditional value is not a boolean.

• and and or should raise an error with the substring "logic expected a boolean" if the arguments are not both booleans. Implementation of short-circuiting behavior is optional.

• not should raise an error with the substring "logic expected a boolean" if its argument is not a boolean.

These error messages should be printed on standard error. The other operators never fail, and so never produce errors.

The == should work polymorphically, regardless of the types of its inputs. If we ask if a boolean is equal to a number, the result should be false, not a runtime error. A value is only == to itself.

We add two new primitive operations, isbool and isnum. These two operations have an effective type of Any -> Bool, and will return true if the argument they receive is indeed a boolean or number, respectively.

The last required primitive operation is print, which prints its single argument to the command line (with a newline), and then returns it. The print_snake_val function in stub.rs explicitly returns its argument; you will need to retrieve and use that value in your compiled output.

### 2Examples

1. The expression

let x = 1 in
let y = print(x + 1) in
print(y + 2)

will output

2
4
4

The first 2 comes from the first print expression. The first 4 comes from the second print expression. The final line prints the answer of the program as usual, so there’s an “extra” 4.

2. The expression

if 54: true else: false

prints (on standard error) something like:

Error: if expected a boolean, got 54

### 3Implementation strategies

#### 3.1Rendering errors

To display error messages on standard error, you’ll need to use a call something like:

eprintln!("Error: if expected a boolean");

I have included a function snake_error() in stub.rs. You will need to add inputs of your design to this function to get your desired error messages. You can follow what we did in class or design it however you like.

#### 3.2Memory Layout and Calling Rust Functions

In order to set up the stack properly to call Rust functions, like print_snake_val and your error functions, it’s necessary to make a few changes to what we had in Boa.

• When calling a Rust function such as print_snake_val, you need to ensure that you decrement RSP enough to move it past all of your local variables before the call and then reverse this by incrementing it after the call returns.

To figure out how much space, you need to implement the space_needed function. There are many ways to implement this, but you need to make sure that you have enough space to fit all of your local variables and that you keep the stack aligned.

When you make a function call, you must ensure that before the call instruction the stack pointer rsp is 16-byte aligned, i.e., divisible by 16. As discussed in Lecture 7, you can achieve this by allocating one extra slot on the stack if you would otherwise be misaligned.

• Participating in the stack: As a callee using the System V AMD64 calling convention, our code has some responsibilities. First, we need to store any callee-save registers that we use in the code upon entry. So if you are using any of the callee-save registers as scratch registers, you should push them upon entry into start_here and pop them back off before the final return.

If you write any functions in stub.rs that you need to be available in your assembly, you need to declare them in the assembly via:

;; In your generated assembly
extern "sysv64" <exported name of function>

#### 3.3New Assembly Constructs

• Sized: You may run into errors that report that the size of an operation is ambiguous. This could happen if you write, for example:

cmp [RSP - 8], 0

because the assembler doesn’t know if the program should compare a four-byte zero, a one-byte zero, or something in between into memory starting at [RSP - 8]. To solve this, you can supply a size:

cmp QWORD [RSP - 8], 0

The QWORD keyword tells the assembler to use the "quad word" size for the memory reference. In our generated assembly code we have always worked at the full 64-bit word level so it is fine to just change all memory references to use the QWORD keyword. If you do not want every memory reference to use the keyword feel free to change types in asm.rs to support more selective use.

• Unsigned vs Signed: Assembly code supports 64-bit unsigned and signed constants and certain values like the representation of true, false and bit patterns cannot be represented as i64 values. So the Arg64/Arg32/Reg32 types have replaced the Imm with two cases for Signed and Unsigned constants. I recommend displaying Unsigned constants in hexadecimal since you will probably only be using them for bit-patterns. To format a number n in hexadecimal, you can use format!("0x{:016x}", n) for 64-bit numbers and format!("0x{:08x}", n) for 32-bit.

• push, pop: These two instructions manage values on the stack. push adds a value at the current location of rsp, and decrements rsp to point past the added value (remember, the stack grows toward numerically-smaller addresses, even though we draw it as growing upward in diagrams). pop increments rsp and moves the value at the location rsp was pointing to into the provided arg.

• A call does two things:

• Pushes the next code location onto the stack (just like a push), which becomes the return pointer

• Performs an unconditional jump to the provided label

• shr, shl, sar: Bit shifting operations. sar (shift-arithmetic-right) preserves the sign bit of the value being shifted, while shr (shift-right) simply fills in a zero bit.

• and, or, xor: Bitwise logical operations.

• test: Performs a bitwise-and of the two arguments, and records whether the result was zero, signed, or overflowed. This can be used in tandem with jumps like jo or jz.

• jo, jno: Jump to the provided label if the last arithmetic operation did/did not overflow.

• Comment: these allow you to add a comment on a line by itself, or in-line after an instruction. These might be helpful for readability of your generated output, while you’re debugging.

#### 3.4Debugging

If you are having trouble producing correct code, I highly recommend

• Inspecting your generated assembly code

• Print out intermediate expressions using println!("{:?}", e)

• Finding minimal examples that produce incorrect behavior

### 4Recommended TODO List

Here’s an order in which you could consider tackling the implementation:

1. Fix the Num case and implement the Bool case of the compiler to correctly encode the values.

2. Extend check (renamed from check_scope) to use the Overflow error to indicate integer constants that are too big to fit into 63-bit signed integers.

3. Implement the Prim2 operations without checking runtime type tags. Test your arithmetic and logical operations with programs where the tag checks would succeed.

4. Similarly, implement the Prim1 operations besides print without checking runtime type tags.

5. Figure out how to test for errors and call the snake_error function when an error is detected. Test as you go. Be aware that if the function call segfaults, it may be because you need to refine your space_needed function.

6. Complete the if case and test as you go.

7. Implement print and in the process space_needed. Test as you go; be aware that you should test interesting sequences of print expressions and let-bindings to make sure your stack integrity is good before and after calls.

Be sure to get your alignment correct! Stack misalignment does not cause errors on Mac, but does on Linux and in particular on the autograder servers so if you are getting errors you can’t replicate on Mac, make sure to think about alignment.

### 5Running main

Running your own programs is the same as with Boa, except you’ll give them the .cobra file extension.

### 6List of Deliverables

• your compile.rs and asm.rs

• the other src/*.rs files in the starter code

• any additional modules you saw fit to write

• your runtime/stub.rs

• the Cargo.toml

• integration tests (tests/examples.rs)

• your test input programs (examples/*.cobra files)

Please ensure the your code builds properly. The autograder will give you a 0 on that portion of the graade if it cannot compile your code.