Assignment 4: Cobra: Multiple types of values
Due: Tue 10/12 at 9:00pm
git clone
In this compiler, you’ll deal with COded Binary RepresntAtions of values. Click here for scary snake picture!
1 The Cobra Language
1.1 Concrete Syntax
The concrete syntax of Cobra is very similar to Boa, with a few new additions.
‹expr› ... true false ‹expr› < ‹expr› ‹expr› > ‹expr› ‹expr› <= ‹expr› ‹expr› >= ‹expr› ‹expr› == ‹expr› ‹expr› && ‹expr› ‹expr› || ‹expr› ! ‹expr› isbool ( ‹expr› ) isnum ( ‹expr› ) print ( ‹expr› )
1.2 Abstract Syntax
The abstract syntax is very similar to Boa, also:
pub enum Prim1 {
...
Not,
Print,
IsBool,
IsNum,
}
pub enum Prim2 {
...
And,
Or,
Lt,
Gt,
Le,
Ge,
Eq,
}
pub enum Exp<Ann> {
...
Bool(bool, Ann),
}
1.3 Semantics
The semantics of booleans are straightforward. The semantics of
Exp::If
changes slightly: its condition must evaluate to a
boolean, and it branches on the truth or falsehood of that value,
rather than whether it’s nonzero.
With the addition of two types to the language, there are two main changes that ripple through the implementation:
The representation of values
The possibility of errors
There is one other major addition, which is the print
primitive, discussed more below.
The representation of values requires a definition. We’ll use the following representations for the Cobra runtime:
true
will be represented as the constant0xFFFFFFFFFFFFFFFF
.false
will be represented as the constant0x7FFFFFFFFFFFFFFF
.numbers will be represented with a zero in the rightmost bit, by shifting them left 1 bit. So, for example, 2 is represented as
0x0000000000000004
.
You should raise errors at runtime in the following cases:
-
,+
,*
,add1
andsub1
should raise an error (by printing it out) with the substring"arithmetic expected a number"
if the operation’s argument(s) are not numbers. You can print more information than this if you like, but the message must have at least this substring.<
,<=
,>
and>=
should raise an error with the substring"comparison expected a number"
if the arguments are not both numbers.+
,-
,*
,add1
andsub1
should raise an error with the substring"overflow"
if the result overflows, and falls outside the range representable in 63 bits. Thejo
instruction will be useful: it jumps if the last instruction overflowed.if
should raise an error with the substring"if expected a boolean"
if the conditional value is not a boolean.and
andor
should raise an error with the substring"logic expected a boolean"
if the arguments are not both booleans. Implementation of short-circuiting behavior is optional.not
should raise an error with the substring"logic expected a boolean"
if its argument is not a boolean.
These error messages should be printed on standard error. The other operators never fail, and so never produce errors.
The ==
should work polymorphically, regardless of the types of
its inputs. If we ask if a boolean is equal to a number, the result
should be false
, not a runtime error. A value is only
==
to itself.
We add two new primitive operations, isbool
and isnum
. These two
operations have an effective type of Any -> Bool
, and will return
true
if the argument they receive is indeed a boolean or number,
respectively.
The last required primitive operation is print
, which prints
its single argument to the command line (with a newline), and then
returns it. The print_snake_val
function in stub.rs
explicitly returns its argument; you will need to retrieve and use
that value in your compiled output.
2 Examples
The expression
let x = 1 in let y = print(x + 1) in print(y + 2)
will output
2 4 4
The first 2 comes from the first
print
expression. The first 4 comes from the secondprint
expression. The final line prints the answer of the program as usual, so there’s an “extra” 4.The expression
if 54: true else: false
prints (on standard error) something like:
Error: if expected a boolean, got 54
3 Implementation strategies
3.1 Rendering errors
To display error messages on standard error, you’ll need to use a call something like:
eprintln!("Error: if expected a boolean");
I have included a function snake_error()
in stub.rs
. You
will need to add inputs of your design to this function to get your
desired error messages. You can follow what we did in class or design
it however you like.
3.2 Memory Layout and Calling Rust Functions
In order to set up the stack properly to call Rust functions, like
print_snake_val
and your error functions, it’s necessary to
make a few changes to what we had in Boa.
Allocating stack space ahead of time: At the start of our generated code (which we now recognize is a function body participating in a C runtime stack), we need to make sure we make enough stack space for all variables we create, and reserve that space ahead of time. To do this, we move
RSP
to point to a location that is N words away (so N * 8 bytes for us), where N is the greatest number of variables we need at once. This is actually tricky to compute to be fully optimal (teaser for later in the semester: by "tricky" I mean NP-hard), but it’s easy to get an OK heuristic —we can compute the maximum depth of nested definitions. You need to add instructions to compiled output in order to make sure the correct space is allocated on the stack by subtracting the right amount from
RSP
. You may choose to implement this via a simple increment ofRSP
, or you may choose to explicitlypush
an appropriate number of zeros onto the stack, to clear out any uninitialized values. (The former is more efficient; the latter might be helpful for debugging.)To figure out how much space, you need to implement the
space_needed
function. There are many ways to implement this, but you need to make sure that you have enough space to fit all of your local variables and that you keep the stack aligned.When you make a function call, you must ensure that before the
call
instruction the stack pointerrsp
is 16-byte aligned, i.e., divisible by 16. As discussed in class, you can achieve this by allocating one extra slot on the stack if you would otherwise be misaligned. This interacts with how many callee-save registers you save in your preamble: if you save an odd number of callee-save registers, you need to allocate an even number of stack slots for local variables, and vice-versa if you save an even number of callee-save registers, you need to allocate an odd number of stack slots.Using the Base Pointer: In addition, this means that all variable references need to happen from
RBP
rather than fromRSP
. This is becauseRSP
can change while we are pushing arguments onto the stack for other function calls, so RBP is the place we should trust for consistent offsets.Participating in the stack: As a callee using the C calling convention, our code has some responsibilities. First, we need to store the callee-save registers that we use in the code upon entry. This includes at least the old base pointer
rbp
, but will likely also includer15
if you use it and any ofrbx, r12, r13, r14
if you use them as well. Next, we update the base pointer to hold the current top of the stack (which includes the return pointer into main, for example). This is why the typical first lines of most C functions are:push RBP ;; push other callee-save registers mov RBP, RSP
Similarly, when we’re done with the function, we need to restore the stack pointer to its old location, and put the old base pointer value back. This is why the last lines before a
ret
in a C function are often:mov RSP, RBP ;; pop other callee-save registers pop RBP
If you write any functions in stub.rs
that you need to be available in
your assembly, you need to declare them in the assembly via:
;; In your generated assembly
extern <exported name of function>
3.3 New Assembly Constructs
Sized
: You may run into errors that report that the size of an operation is ambiguous. This could happen if you write, for example:cmp [RBP-8], 0
because the assembler doesn’t know if the program should compare a four-byte zero, a one-byte zero, or something in between into memory starting at
[RBP-8]
. To solve this, you can supply a size:cmp QWORD [RBP-8], 0
The
QWORD
keyword tells the assembler to use the "quad word" size for the memory reference. In our generated assembly code we have always worked at the full 64-bit word level so it is fine to just change all memory references to use theQWORD
keyword. If you do not want every memory reference to use the keyword feel free to change types inasm.rs
to support more selective use.Unsigned vs Signed
: Assembly code supports 64-bit unsigned and signed constants and certain values like the representation oftrue
,false
and bit patterns cannot be represented asi64
values. So theArg64
/Arg32
/Reg32
types have replaced theImm
with two cases forSigned
andUnsigned
constants. I recommend displayingUnsigned
constants in hexadecimal since you will probably only be using them for bit-patterns. To format a numbern
in hexadecimal, you can useformat!("0x{:016x}", n)
for 64-bit numbers andformat!("0x{:08x}", n)
for 32-bit.push
,pop
: These two instructions manage values on the stack.push
adds a value at the current location ofrsp
, and decrementsrsp
to point past the added value (remember, the stack grows toward numerically-smaller addresses, even though we draw it as growing upward in diagrams).pop
incrementsrsp
and moves the value at the locationrsp
was pointing to into the provided arg.A
call
does two things:Pushes the next code location onto the stack (just like a
push
), which becomes the return pointerPerforms an unconditional jump to the provided label
Note that
call
does not affectRBP
, which the program must maintain on its own.shr
,shl
,sar
: Bit shifting operations.sar
(shift-arithmetic-right) preserves the sign bit of the value being shifted, whileshr
(shift-right) simply fills in a zero bit.and
,or
,xor
: Bitwise logical operations.test
: Performs a bitwise-and of the two arguments, and records whether the result was zero, signed, or overflowed. This can be used in tandem with jumps likejo
orjz
.jo
,jno
: Jump to the provided label if the last arithmetic operation did/did not overflow.Comment
: these allow you to add a comment on a line by itself, or in-line after an instruction. These might be helpful for readability of your generated output, while you’re debugging.
3.4 Debugging
If you are having trouble producing correct code, I highly recommend
Inspecting your generated assembly code
Print out intermediate expressions using
println!("{:?}", e)
Finding minimal examples that produce incorrect behavior
3.5 Testing
Only integration tests, i.e., tests/examples.rs
will be graded
for this assignment.
4 Recommended TODO List
Here’s an order in which you could consider tackling the implementation:
Fix the Num case and implement the Bool case of the compiler to correctly encode the values.
Extend
check
(renamed fromcheck_scope
) to use theOverflow
error to indicate integer constants that are too big to fit into 63-bit signed integers.Implement the
Prim2
operations without checking runtime type tags. Test your arithmetic and logical operations with programs where the tag checks would succeed.Similarly, implement the
Prim1
operations besidesprint
without checking runtime type tags.Add the
push rbp ... mov rbp, rsp
preamble and corresponding epilogue to your compiled code.Update your variable generation code to use locations releative to
rbp
rather thanrsp
. Re-run your existing tests to make sure they are working!.Implement
space_needed
and correctly increment the stack pointer upon entry into your function.Figure out how to test for errors and call the
snake_error
function when an error is detected. Test as you go. Be aware that if the function call segfaults, it may be because you need to refine yourspace_needed
function.Complete the if case and test as you go.
Implement
print
. Be aware that if the call doesn’t work, it may be because ofspace_needed
again. Test as you go; be aware that you should test interesting sequences of print expressions and let-bindings to make sure your stack integrity is good before and after calls.
5 Running main
Running your own programs is the same as with Boa, except you’ll give them
the .cobra
file extension.
6 List of Deliverables
your
compile.rs
andasm.rs
the other src/*.rs files in the starter code
any additional modules you saw fit to write
your
runtime/stub.rs
the Cargo.toml
integration tests (
tests/examples.rs
)your test input programs (
examples/*.cobra
files)
Please ensure the your code builds properly. The autograder will give you a 0 on that portion of the graade if it cannot compile your code.
7 Grading Standards
For this assignment, you will be graded on
Whether your code implements the specification (functional correctness),
the clarity and cleanliness of your code, and
the comprehensiveness of your test coverage
8 Submission
Wait! Please read the assignment again and verify that you have not forgotten anything!
Please submit your homework to gradescope by the above deadline.