HW2: X86lite

Overview

In this project you will implement an assembler and simulator for a small, idealized subset of the X86-64 platform that will serve as the target language for the compilers we build in later projects. This project will continue to help get you up to speed with OCaml programming – we’ll need a few more constructs not used in HW1. You will also implement a non-trivial assembly-language program by hand to familiarize yourself with the workings of the X86 architecture.

Note

This assignment is significantly harder than HW1. Start early.

Note

You are encouraged to work in pairs for this project.

Getting Started

To get started, clone your team’s repository on EECS Gitlab.

The files included in the repository are briefly described below. Those marked with * are the only ones you should need to modify while completing this assignment.

Makefile	builds main.native, also supports targets ‘test’ and ‘zip’
util/assert.ml(i)	the assertion framework
test/gradedtests.ml	graded test cases that we provide
bin/main.ml	the main test harness
x86/x86.mli	the X86lite interface
x86/x86.ml	the X86lite instruction set implementation
bin/int64_overflow.ml(i)	library for working with int64 values
*bin/simulator.ml	where your interpreter and assembler code (Parts I and II) should go
*test/studenttests.ml	where your submitted test cases (Part III) should go

Building the Project

It is recommended that you compile your projects from the command line, using make. We have included a Makefile that provides several make targets that can help you with the homework:

make       --  builds oatc using dune
make test  --  runs the test suite
make clean --  cleans your project directory
make utop  --  starts a utop for the project
make zip   --  creates a zip file with the code you need to submit

Command-line Running and Testing Projects

After compiling the project with make, you can run it from the command line by running ./oatc. You may run the simulator’s unit tests by running make test or ./oatc --test.

Part I: The Simulator

X86lite assembly code is organized into labeled blocks of instructions, which might be written in concrete syntax as shown below:

      .text
fac:

      subq    $8, %rsp
      cmpq    $1, %rdi
      jle     exit
      movq    %rdi, (%rsp)
      decq    %rdi
      callq   fac
      imulq   (%rsp), %rax
      addq    $8, %rsp
      retq
exit:
      movq    $1, %rax
      addq    $8, %rsp
      retq
      .globl  main
main:
      movq    $5, %rdi
      callq   fac
      retq

This code has three blocks, labeled fac, exit, and main. The code at labels fac and exit implements a recursive version of the familiar factorial function. The code at main calls factorial with the immediate value 5.

In this part of the project you will implement a simulator for the X86lite platform, but rather than using the concrete syntax shown above, you will execute programs that have been converted to machine code and layed out in the memory of an idealized X86lite machine:

[| ...
 InsB0 (Subq,  [Imm (Lit 8L); Reg Rsp]);  InsFrag; InsFrag; InsFrag; InsFrag; InsFrag; InsFrag; InsFrag
 InsB0 (Cmpq,  [Imm (Lit 1L); Reg Rdi]);  InsFrag; InsFrag; InsFrag; InsFrag; InsFrag; InsFrag; InsFrag
 InsB0 (J Le,  [Imm (Lit 72L)]);          InsFrag; InsFrag; InsFrag; InsFrag; InsFrag; InsFrag; InsFrag
 InsB0 (Movq,  [Reg Rdi;      Ind2 Rsp]); InsFrag; InsFrag; InsFrag; InsFrag; InsFrag; InsFrag; InsFrag
 InsB0 (Decq,  [Reg Rdi]);                InsFrag; InsFrag; InsFrag; InsFrag; InsFrag; InsFrag; InsFrag
 InsB0 (Callq, [Imm (Lit 0L)]);           InsFrag; InsFrag; InsFrag; InsFrag; InsFrag; InsFrag; InsFrag
 InsB0 (Imulq, [Ind2 Rsp;     Reg Rax]);  InsFrag; InsFrag; InsFrag; InsFrag; InsFrag; InsFrag; InsFrag
 InsB0 (Addq,  [Imm (Lit 8L); Reg Rsp]);  InsFrag; InsFrag; InsFrag; InsFrag; InsFrag; InsFrag; InsFrag
 InsB0 (Retq,  []);                       InsFrag; InsFrag; InsFrag; InsFrag; InsFrag; InsFrag; InsFrag
 InsB0 (Movq,  [Imm (Lit 1L); Reg Rax]);  InsFrag; InsFrag; InsFrag; InsFrag; InsFrag; InsFrag; InsFrag
 InsB0 (Addq,  [Imm (Lit 8L); Reg Rsp]);  InsFrag; InsFrag; InsFrag; InsFrag; InsFrag; InsFrag; InsFrag
 InsB0 (Retq,  []);                       InsFrag; InsFrag; InsFrag; InsFrag; InsFrag; InsFrag; InsFrag
 InsB0 (Movq,  [Imm (Lit 5L); Reg Rdi]);  InsFrag; InsFrag; InsFrag; InsFrag; InsFrag; InsFrag; InsFrag
 InsB0 (Callq, [Imm (Lit 0L)]);           InsFrag; InsFrag; InsFrag; InsFrag; InsFrag; InsFrag; InsFrag
 InsB0 (Retq,  []);                       InsFrag; InsFrag; InsFrag; InsFrag; InsFrag; InsFrag; InsFrag
 ...
|]

This is just an OCaml array of sbytes, “symbolic” bytes where InsB0 represents the first byte of an instruction and seven subsequent InsFrags represent the remaining seven bytes. While in a real machine each fragment would encode meaningful information about the instructions, this approach hides the details of a specific encoding and aids in debugging. The actual encoding of X86 instructions in particular is notoriously complicated, and as we mentioned in class, variable in length. We will assume a fixed-length, 8-byte encoding of X86lite for our simulator, representing instructions in memory as sbytes. Fetching and decoding an instruction will simply involve reading the contents of its first byte, ignoring the following InsFrags.

The OCaml datatype used for instructions is defined in the provided x86.ml, and the x86lite specification gives the full details about the meaning of each instruction.

Note

Read (or at least skim) the x86lite specification now. You might want to correlate the various parts of the X86lite machine with the datatypes defined in x86.ml.

The X86lite specification is written from the point of view of actual X86 hardware, except for the behavior of labels, which are “resolved” by another program, the assembler (and linker/loader). Your simulator can assume that this has already been done, so instruction operands will not contain labels. In the memory image for the factorial example above, you can see that calls using the label fac and jumps using exit have been replaced with literal immediate operands OL and 72L.

Our ML-level interpreter’s representation of the X86lite machine state is given by the following type:

type flags = { mutable fo : bool
             ; mutable fs : bool
             ; mutable fz : bool
             }

type regs = quad array

type mem = sbyte array

type mach = { flags : flags
            ; regs : regs
            ; mem : mem
            }

The memory and register files are simulated by OCaml-level (mutable) arrays of sbytes and quads (OCaml 64-bit integers), respectively. The three condition flags are mutable boolean fields; all of the state is bundled together in a record (see IOC Chapter 8.1 for more about OCaml’s record types).

The main differences between the interpreter and the environment in which real X86 programs are executed include:

Memory: Our simulator will use only 64K bytes of memory. The part of the heap simulated is the block of highest addressable memory locations – in simulator.ml, this block is bounded from below by mem_bot and from above by mem_top. We will not model requesting memory from the operating system: you can assume the entire 64K address space has been paged in before execution of the program starts. We will also not model any of the restrictions on alignment or code layout related to memory paging.
Symbolic instruction encoding: As described at the beginning of the section, we will assume a fixed-length, 8-byte instruction encoding by representing instructions symbolically in memory. The behavior of programs that read or manipulate sbytes representing instructions as data is not specified. Your simulator may raise an error or assume some default behavior: we will not test these cases.
Operand restrictions: The X86Lite specification mentions several restrictions on the operands of various instructions. For example leaq can only take an indirect memory operand. Your simulator is not required to detect invalid operands, and may raise an exception or choose some convenient behavior. In other words, your simulator may implement a superset of the X86lite specification by executing instructions with invalid operands. We will only test your simulator with programs that conform to the restrictions in the specification.
Termination and system calls: Normally, a program will terminate by notifying the operating system using a system call (e.g. exit on POSIX systems). We will not simulate system calls, so instead we use a sentinel address outside of our address space, exit_addr, to indicate that a program has terminated. The provided``run`` function will call the step function until %rip contains exit_addr. To achieve this, you should begin execution with exit_addr on the top of the stack, so that executing RETQ without first pushing something else on the stack will terminate the program.

Provided Code

sbyte serialization
Machine state and X86 instruction datatypes
The Int64_overflow module

Tasks

Complete the implementation in the simulator.ml file, some parts of which are given to you. We recommend that you do things in this order:

First, as an exercise in condition codes, implement the interp_cnd function.
Second, as another simple warm-up, implement the map_addr function, which maps X86lite addresses (represented as quad values) into Some OCaml array index (or None if the address is not in the legal address space).
Third, implement the interpretation of operands (including indirect addresses), since this functionality will be needed for simulating instructions.
Finally, implement the step function, which simulates the execution of a single instruction by modifying the machine state passed as an argument.

Hints

We have provided a module for performing 64-bit arithmetic with overflow detection. You may find this useful for setting the status flags.
You’ll probably want a function that sets the three condition flags after a result has been computed.
Groups of instructions share common behavior – for example, all of the arithmetic instructions are quite similar. You should factor out the commonality as much as you can in order to keep your code clean. Remember that code style constitutes a non-trivial portion of your grade.
You will probably want to develop small test cases to try out the functionality of your interpreter. See gradedtests.ml for some examples of how to set up tests that can look at the final state of the machine.

Tests

We will grade this part of the assignment based on a suite of tests. Some of them are available for you to look at in gradedtests.ml, the rest of them we reserve for our own cases. We will also stress-test your interpreter on a number of “big” programs (see Part III) that we have developed and that you and your classmates will develop as part of this project.

To help other teams debug their interpreters, you are encouraged to share “microbenchmark” test cases by posting them to the indicated thread on Piazza. These should be short (2-3 instruction) programs that test various functional aspects of the interpreter. We will not use these tests in our grading. You may add such test cases to the test suite defined in studenttests.ml.

Part II: X86lite Assembler and Loader

Writing machine code directly is difficult and error-prone, even using our symbolic representation of instructions. The example factorial program in the previous section is written as a set of instructions for an assembler, a program that can automate much of the process for us. The primary functionality of the assembler for the purposes of this project includes the translation of human-readable mnemonics for instructions into machine code, and the translation of symbolic labels that appear in the assembly program into addresses understood by the machine.

Rather than working with a concrete syntax as in the above example, we will use the abstract syntax defined in x86.ml:

[ text "fac"
     [ Subq,  [~$8; ~%Rsp]
     ; Cmpq,  [~$1; ~%Rdi]
     ; J Le,  [~$$"exit"]
     ; Movq,  [~%Rdi; Ind2 Rsp]
     ; Decq,  [~%Rdi]
     ; Callq, [~$$"fac"]
     ; Imulq, [Ind2 Rsp; ~%Rax]
     ; Addq,  [~$8; ~%Rsp]
     ; Retq,  []
     ]
; text "exit"
     [ Movq,  [~$1; ~%Rax]
     ; Addq,  [~$8; ~%Rsp]
     ; Retq,  []
     ]
; gtext "main"
     [ Movq,  [~$n; ~%Rdi]
     ; Callq, [~$$"fac"]
     ; Retq,  []
     ]
]

As you can see, the correspondence between the abstract syntax and the concrete syntax is quite close. The file x86.ml and its corresponding interface x86.mli together provide the basic definitions for the creating and manipulating X86lite abstract syntax – the main types you should be aware of are lbl, reg, operand, cnd, ins, and. Each of these corresponds fairly directly to a concept from the X86lite spec.

In addition to the instructions covered in the spec, X86lite assembly programs can contain label declarations and data consisting of either 64-bit words or zero-terminated strings. Each label declaration also has a visibility modifier, though these will only be used in later projects. X86lite assembly programs are represented using the following types defined in x86.ml:

type data = Asciz of string
          | Quad of imm

type asm = Text of ins list
         | Data of data list

type elem = { lbl: lbl; global: bool; asm: asm }

type prog = elem list

The elem type represents a section of an assembly program beginning with a label that contains either a list of instructions or a list of data. Each piece of data in a data section is a 64-bit value or a string. We can access the contents of each of these sections via offsets from their associated labels. X86lite assembly programs are lists of labeled elem blocks.

Your goal in this part of the assignment is to translate an X86.prog into an initial machine state that can be executed by your simulator. While this is not the simplest way to execute an X86lite program, it is meant to illustrate some of what the system assembler, linker, and loader will do to the assembly your compiler will generate in future projects.

This part of the project will involve serializing instructions and data into sbytes, laying out the program in memory, resolving labels to addresses, and initializing the machine state. We can split this into two phases, assembling and loading. The assembler will do most of the work, outputting an executable that the loader will use to generate an initial machine state:

type exec = { entry    : quad
            ; text_pos : quad
            ; data_pos : quad
            ; text_seg : sbyte list
            ; data_seg : sbyte list
            }

An executable contains the following fields:

entry: The entry point of the program, the address in memory of the first instruction executed.
text_pos, data_pos: The address at which the following memory segments should be loaded.
text_seg, data_seg: The assembled code for the text and data sections of the assembly program, with symbolic labels resolved.

Unlike an assembly program represented as an X86.prog, an object file has a single, contiguous segment of memory containing instructions and a single contiguous segment containing data. This is not strictly necessary to execute the program, but real systems often keep executable code in sections of memory that are guaranteed to be read-only by the virtual memory system for security purposes. Also, our executables contain neither declarations nor uses of labels. The provided functions to convert instructions and data to sbytes guarantee this invariant.

Executable File Specification

We will require very specific output from your assembler and loader. Though programs may still execute correctly using other layouts, uniform outputs are necessary for testing purposes.

The text_seg and data_seg fields of the executable should consist of the serialized contents of the Text and Data sections of the assembly program in the order that they appear, without any extra padding or extraneous sbytes. Use the supplied sbytes_of_ins and sbytes_of_data functions. The text_pos field must be exactly 0x400000, the lowest addressable byte in the simulator. The data_pos field must contain the address immediately following the end of the text segment in memory. The entry field must contain the address of the first instruction after the label "main" in the assembly program.

The assemble function should raise an Undefined_symbol exception if it encounters a label that is not declared in the source program, or if "main" is not declared.

Loader Specification and Memory Layout

The load function should initialize a machine state given an executable file by copying the contents of text_seg and data_seg to the load addresses specified in text_pos and data_pos. It should initialize the instruction pointer to the address in entry, and the stack pointer to the highest legal memory address of our simulator. The contents of memory at the highest address should be the sentinel exit_addr.

Tasks

Fill out the assemble function in the file simulator.ml, which creates an obj given an X86.prog. First, calculate the address where text and data should be loaded according to the memory layout described above. Then, to resolve forward references to labels, you will need to traverse the assembly program and construct a symbol table to record the absolute address of each label definition you encounter. The last step is to traverse the assembly program a second time, outputting sbytes for each instruction and data element you encounter. You will need to use your symbol table to replace labels, which can occur in instruction operands or Quad data, with their addresses.
Fill out the load function, which creates an initial machine state given an object file. You will need to create an initial memory and copy the segments of the object file to their specified load addresses. You will also have to initialize the machine registers, setting %rip and %rsp appropriately. Lastly, you will need to initialize the stack to contain the sentinel exit_addr described in the previous section.

Part III: X86lite Assembly Programming

For this part of the assignment, you will create (by hand) a non-trivial X86lite assembly program to test your interpreter’s behavior and gain some experience programming in X86lite. The factorial program supplied with the test code is a minimal example of what we mean by “non-trivial” – only test cases at least this level of difficulty can earn full credit. In particular, your program should include:

Non-trivial control flow: either nested loops, a recursive function, or something similarly complex
At least one conditional branch.
Some amount of arithmetic or logic.
One or more test cases that excercise your program and test the correctness of the interpreter’s behavior. If your program computes a simple answer, it can return it in Rax and you can use the program_test harness as in the factorial example found in gradedtests.ml; for more complex programs you might want to use the machine_test function, which lets you examine the resulting state of the interpreter.

Some good candidates for such programs include: simple sorting or searching algorithms (i.e. treat some chunk of memory as an array of values to be sorted), simple arithmetic algorithms such as GCD, recursive functions over integers or very simple data structures such as linked lists. If you are unsure whether the test case you’d like to implement is sufficient, contact one of the course staff.

Your test should be a function of type unit -> unit that works in the assertion framework (as defined in assert.ml). This test should be supplied in studenttests.ml as the “Student-Provided Big Test for Part III”. We will hand grade this submission (and test it against our own interpreter). We will also use all correct submitted tests to validate all of the other projects in the course – the trickier your test is, the harder it will be for other teams to pass it!

Note

Unlike the “microbenchmark” tests from Part I, you should not post your big test to Piazza.

Grading

Submit your solution to this assignment by following the submission instructions

Projects that do not compile will receive no credit!

Your team’s grade for this project will be based on:

75 Points for implementing the X86lite simulator, assembler and loader in Parts I and II , graded via our test cases (29 of these 75 points come from our hidden test cases, which contain several hundred tests and will be run only on the submission server). The maximum score you can get from the automated grading server is 75 points.
10 Points for submitting a non-trivial X86 test program as described in Part III above. (graded manually)
10 Points for passing tests submitted by other teams. (graded manually)
5 Points for “programming style” – see these guidelines. (graded manually)