Assignment 3: Cobra: Multiple types of values
Due: Fri 10/13 at 5:00pm
git clone
In this compiler, you’ll deal with COded Binary RepresntAtions of values. Click here for scary snake picture!
1 The Cobra Language
1.1 Concrete Syntax
The concrete syntax of Cobra is very similar to Boa, with a few new additions.
‹expr› ... true false ‹expr› < ‹expr› ‹expr› > ‹expr› ‹expr› <= ‹expr› ‹expr› >= ‹expr› ‹expr› == ‹expr› ‹expr› && ‹expr› ‹expr› || ‹expr› ! ‹expr› isbool ( ‹expr› ) isnum ( ‹expr› ) print ( ‹expr› )
1.2 Abstract Syntax
The abstract syntax is very similar to Boa, also:
pub enum Prim1 {
...
Not,
Print,
IsBool,
IsNum,
}
pub enum Prim2 {
...
And,
Or,
Lt,
Gt,
Le,
Ge,
Eq,
}
pub enum Exp<Ann> {
...
Bool(bool, Ann),
}
1.3 Semantics
The semantics of booleans are straightforward. The semantics of
Exp::If
changes slightly: its condition must evaluate to a
boolean, and it branches on the truth or falsehood of that value,
rather than whether it’s nonzero.
With the addition of two types to the language, there are two main changes that ripple through the implementation:
The representation of values
The possibility of errors
There is one other major addition, which is the print
primitive, discussed more below.
The representation of values requires a definition. We’ll use the following representations for the Cobra runtime:
true
will be represented as the constant0xFFFFFFFFFFFFFFFF
.false
will be represented as the constant0x7FFFFFFFFFFFFFFF
.numbers will be represented with a zero in the rightmost bit, by shifting them left 1 bit. So, for example, 2 is represented as
0x0000000000000004
.
You should raise errors at runtime in the following cases:
-
,+
,*
,add1
andsub1
should raise an error (by printing it out) with the substring"arithmetic expected a number"
if the operation’s argument(s) are not numbers. You can print more information than this if you like, but the message must have at least this substring.<
,<=
,>
and>=
should raise an error with the substring"comparison expected a number"
if the arguments are not both numbers.+
,-
,*
,add1
andsub1
should raise an error with the substring"overflow"
if the result overflows, and falls outside the range representable in 63 bits. Thejo
instruction will be useful: it jumps if the last instruction overflowed.if
should raise an error with the substring"if expected a boolean"
if the conditional value is not a boolean.and
andor
should raise an error with the substring"logic expected a boolean"
if the arguments are not both booleans. Implementation of short-circuiting behavior is optional.not
should raise an error with the substring"logic expected a boolean"
if its argument is not a boolean.
These error messages should be printed on standard error. The other operators never fail, and so never produce errors.
The ==
should work polymorphically, regardless of the types of
its inputs. If we ask if a boolean is equal to a number, the result
should be false
, not a runtime error. A value is only
==
to itself.
We add two new primitive operations, isbool
and isnum
. These two
operations have an effective type of Any -> Bool
, and will return
true
if the argument they receive is indeed a boolean or number,
respectively.
The last required primitive operation is print
, which prints
its single argument to the command line (with a newline), and then
returns it. The print_snake_val
function in stub.rs
explicitly returns its argument; you will need to retrieve and use
that value in your compiled output.
2 Examples
The expression
let x = 1 in let y = print(x + 1) in print(y + 2)
will output
2 4 4
The first 2 comes from the first
print
expression. The first 4 comes from the secondprint
expression. The final line prints the answer of the program as usual, so there’s an “extra” 4.The expression
if 54: true else: false
prints (on standard error) something like:
Error: if expected a boolean, got 54
3 Implementation strategies
3.1 Rendering errors
To display error messages on standard error, you’ll need to use a call something like:
eprintln!("Error: if expected a boolean");
We have included unimplemented functions snake_error()
in stub.rs
. You
will need to add inputs of your design to this function to get your
desired error messages. You can follow what we did in class or design
it however you like.
3.2 Memory Layout and Calling Rust Functions
In order to set up the stack properly to call Rust functions, like
print_snake_val
and your error functions, it’s necessary to
make a few changes to what we had in Boa.
When calling a Rust function such as
print_snake_val
, you need to ensure that you decrementRSP
enough to move it past all of your local variables before the call and then reverse this by incrementing it after the call returns.To figure out how much space, you need to implement the
space_needed
function. There are many ways to implement this, but you need to make sure that you have enough space to fit all of your local variables and that you keep the stack aligned.When you make a function call, you must ensure that when the called function starts executing, the value of the stack pointer
rsp
is 8 modulo 16. If you use thecall
instruction to make the call, then because the call instruction decrements the value ofrsp
by 8, you want the stack pointer to be 0 modulo 16 immediately before executing thecall
instruction. As discussed in Lecture 7, you can achieve this by allocating one extra slot of "junk" space on the stack if you would otherwise be misaligned.Participating in the stack: As a callee using the System V AMD64 calling convention, our code has some responsibilities. First, we need to store any callee-save registers that we use in the code upon entry. So if you are using any of the callee-save registers as scratch registers, you should push them upon entry into
start_here
and pop them back off before the final return.
If you write any functions in stub.rs
that you need to be available in
your assembly, you need to declare them in the assembly via:
;; In your generated assembly
extern "sysv64" <exported name of function>
3.3 New Assembly Constructs
Sized
: You may run into errors that report that the size of an operation is ambiguous. This could happen if you write, for example:cmp [RSP - 8], 0
because the assembler doesn’t know if the program should compare a four-byte zero, a one-byte zero, or something in between into memory starting at
[RSP - 8]
. To solve this, you can supply a size:cmp QWORD [RSP - 8], 0
The
QWORD
keyword tells the assembler to use the "quad word" size for the memory reference. In our generated assembly code we have always worked at the full 64-bit word level so it is fine to just change all memory references to use theQWORD
keyword. If you do not want every memory reference to use the keyword feel free to change types inasm.rs
to support more selective use.Unsigned vs Signed
: Assembly code supports 64-bit unsigned and signed constants and certain values like the representation oftrue
,false
and bit patterns cannot be represented asi64
values. So theArg64
/Arg32
/Reg32
types have replaced theImm
with two cases forSigned
andUnsigned
constants. We recommend displayingUnsigned
constants in hexadecimal since you will probably only be using them for bit-patterns. To format a numbern
in hexadecimal, you can useformat!("0x{:016x}", n)
for 64-bit numbers andformat!("0x{:08x}", n)
for 32-bit.push
,pop
: These two instructions manage values on the stack.push
adds a value at the current location ofrsp
, and decrementsrsp
to point past the added value (remember, the stack grows toward numerically-smaller addresses, even though we draw it as growing upward in diagrams).pop
incrementsrsp
and moves the value at the locationrsp
was pointing to into the provided arg.A
call
does two things:Pushes the next code location onto the stack (just like a
push
), which becomes the return pointerPerforms an unconditional jump to the provided label
shr
,shl
,sar
: Bit shifting operations.sar
(shift-arithmetic-right) preserves the sign bit of the value being shifted, whileshr
(shift-right) simply fills in a zero bit.and
,or
,xor
: Bitwise logical operations.test
: Performs a bitwise-and of the two arguments, and records whether the result was zero, signed, or overflowed. This can be used in tandem with jumps likejo
orjz
.jo
,jno
: Jump to the provided label if the last arithmetic operation did/did not overflow.Comment
: these allow you to add a comment on a line by itself, or in-line after an instruction. These might be helpful for readability of your generated output, while you’re debugging.
3.4 Debugging
If you are having trouble producing correct code, we highly recommend
Inspecting your generated assembly code
Print out intermediate expressions using
println!("{:?}", e)
Finding minimal examples that produce incorrect behavior
4 Recommended TODO List
Here’s an order in which you could consider tackling the implementation:
Extend any tagging functions and sequentialization
Fix the Num case and implement the Bool case of the compiler to correctly encode the values.
Extend
check_prog
to use theOverflow
error to indicate integer constants that are too big to fit into 63-bit signed integers.Implement arithmetic and logical
Prim
operations without checking runtime type tags. Test programs where the tag checks would succeed.Figure out how to test for errors and call the
snake_error
function when an error is detected. Test as you go. Be aware that if the function call segfaults, it may be because you need to refine yourspace_needed
function.Complete the if case and test as you go.
Implement
print
and in the processspace_needed
. Test as you go; be aware that you should test interesting sequences of print expressions and let-bindings to make sure your stack integrity is good before and after calls.
Be sure to get your alignment correct! Stack misalignment does not
typically cause errors on Mac, but does on Linux and in particular on
the autograder servers so if you are getting errors you can’t
replicate on Mac, make sure to think about alignment. One way to
harden your code to force stack misalignment to error is at any
point before you use the call
instruction for a call into Rust to
insert the following code:
test RSP, 0xF
jnz mis_aligned
Where mis_aligned
is a label to some code that calls your
snake_error
function (fixing the stack alignment first!) to
report a stack alignment error.
5 Running main
Running your own programs is the same as with Boa, except you’ll give them
the .cobra
file extension.
6 List of Deliverables
your
compile.rs
andasm.rs
the other src/*.rs files in the starter code
any additional modules you saw fit to write
your
runtime/stub.rs
the Cargo.toml
integration tests (
tests/examples.rs
)your test input programs (
examples/*.cobra
files)
Please ensure the your code builds properly. The autograder will give you a 0 on that portion of the graade if it cannot compile your code.
7 Grading Standards
For this assignment, you will be graded on
Whether your code implements the specification (functional correctness),
The autograder will again include tests that are hidden until after the deadline that are worth 10% of your score.
8 Submission
Wait! Please read the assignment again and verify that you have not forgotten anything!
Please submit your homework to gradescope by the above deadline.