CS429: Computer Organization and Architecture

Instruction Set Architecture

Dr. Bill Young
Department of Computer Sciences
University of Texas at Austin

Last updated: February 6, 2017 at 13:39
Topics of this Slideset

- Intro to Assembly language
- Programmer visible state
- Y86 Rudiments
- RISC vs. CISC architectures
**Instruction Set Architecture**

**Assembly Language View**
- Processor state: registers, memory, etc.
- Instructions and how instructions are encoded

**Layer of Abstraction**
- Above: how to program machine, processor executes instructions sequentially
- Below: What needs to be built
  - Use variety of tricks to make it run faster
  - E.g., execute multiple instructions simultaneously
Why Y86?

The Y86 is a “toy” machine that is similar to the x86 but much simpler. It is a gentler introduction to assembly level programming than the x86.

- just a few instructions as opposed to hundreds for the x86;
- fewer addressing modes;
- simpler system state;
- absolute addressing.

Everything you learn about the Y86 will apply to the x86 with very little modification. But the main reason we’re bothering with the Y86 is because we’ll be explaining pipelining in that context.
There are various means of giving a *semantics* or meaning to a programming system.

Probably the most sensible for an assembly (or machine) language is an *operational semantics*, also known as an *interpreter semantics*.

That is, we explain the semantics of each possible operation in the language by explaining the effect that execution of the operation has on the *machine state*. 
The most fundamental abstraction for the machine semantics for the x86/Y86 or similar machines is the *fetch-decode-execute* cycle.

The machine repeats the following steps forever:

1. fetch the next instruction from memory (the PC tells you which is next);
2. decode the instruction (in the control unit);
3. execute the instruction, updating the state appropriately;
4. go to step 1.
Figure 4-1. Postal Executives At Work On An Instruction: 21254 STA 3300, Y.
Conventions

It’s important to understand how individual operations update the system state. *But that’s not enough!*

Much of the way the Y86/x86 operates is based on a set of *programming conventions*. Without them, you won’t understand how programs work, what the compiler generates, or how your code can interact with code written by others.
The following are conventions necessary to make programs interact:

- How do you pass arguments to a procedure?
- Where are variables (local, global, static) created?
- How does a procedure return a value?
- How do procedures save and restore the state of the caller?

Some of these (e.g., the direction the stack grows) are reflected in specific machine operations; others are purely conventions.
Program registers: almost the same as x86-64, each 64-bits

Condition flags: 1-bit flags set by arithmetic and logical operations. OF: Overflow, ZF: Zero, SF: Negative

Program counter: indicates address of instruction

Memory

- Byte-addressable storage array
- Words stored in little-endian byte order

Status code: (status can be AOK, HLT, INS, ADR) to indicate state of program execution.
We’re actually describing two languages: the assembly language and the machine language. There is nearly a 1-1 correspondence between them.

**Machine Language Instructions**

- 1-10 bytes of information read from memory
  - Can determine instruction length from first byte
  - Not as many instruction types and simpler encoding than x86-64
- Each instruction accesses and modifies some part(s) of the program state.
## Y86 Instruction Set

<table>
<thead>
<tr>
<th>Byte</th>
<th>0</th>
<th>1</th>
<th>2</th>
<th>3</th>
<th>4</th>
<th>5</th>
<th>6</th>
<th>7</th>
<th>8</th>
<th>9</th>
</tr>
</thead>
<tbody>
<tr>
<td>halt</td>
<td>0</td>
<td>0</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>nop</td>
<td>1</td>
<td>0</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>cmovXX rA,rB</td>
<td>2</td>
<td>fn</td>
<td>rA</td>
<td>rB</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>irmovq V,rB</td>
<td>3</td>
<td>0</td>
<td>F</td>
<td>rB</td>
<td></td>
<td></td>
<td></td>
<td>V</td>
<td></td>
<td></td>
</tr>
<tr>
<td>rmmovq rA,D(rB)</td>
<td>4</td>
<td>0</td>
<td>rA</td>
<td>rB</td>
<td></td>
<td></td>
<td></td>
<td>D</td>
<td></td>
<td></td>
</tr>
<tr>
<td>mrmovq D(rB),rA</td>
<td>5</td>
<td>0</td>
<td>rA</td>
<td>rB</td>
<td></td>
<td></td>
<td></td>
<td>D</td>
<td></td>
<td></td>
</tr>
<tr>
<td>OPq rA,rB</td>
<td>6</td>
<td>fn</td>
<td>rA</td>
<td>rB</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>jXX Dest</td>
<td>7</td>
<td>fn</td>
<td></td>
<td></td>
<td>Dest</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>call Dest</td>
<td>8</td>
<td>0</td>
<td></td>
<td></td>
<td>Dest</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>ret</td>
<td>9</td>
<td>0</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>pushq rA</td>
<td>A</td>
<td>0</td>
<td>rA</td>
<td>F</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>popq rA</td>
<td>B</td>
<td>0</td>
<td>rA</td>
<td>F</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>
Each register has an associated 4-bit id:

<table>
<thead>
<tr>
<th>Register</th>
<th>ID</th>
<th></th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td>%rax</td>
<td>0</td>
<td>%r8</td>
<td>8</td>
</tr>
<tr>
<td>%rcx</td>
<td>1</td>
<td>%r9</td>
<td>9</td>
</tr>
<tr>
<td>%rdx</td>
<td>2</td>
<td>%r10</td>
<td>A</td>
</tr>
<tr>
<td>%rbx</td>
<td>3</td>
<td>%r11</td>
<td>B</td>
</tr>
<tr>
<td>%rsp</td>
<td>4</td>
<td>%r12</td>
<td>C</td>
</tr>
<tr>
<td>%rbp</td>
<td>5</td>
<td>%r13</td>
<td>D</td>
</tr>
<tr>
<td>%rsi</td>
<td>6</td>
<td>%r14</td>
<td>E</td>
</tr>
<tr>
<td>%rdi</td>
<td>7</td>
<td>no reg</td>
<td>F</td>
</tr>
</tbody>
</table>

Almost the same encoding as in x86-64.

Most of these registers are general purpose; %rsp has special functionality.
Y86 Instruction Set (2)

\[
\begin{array}{c|c|c|c}
\text{cmovXX } rA, rB & 2 & \text{fn} & rA, rB \\
\end{array}
\]

Encompasses:

\[
\begin{array}{c|c|c|c|c}
\text{rrmovq } rA, rB & 2 & 0 & \text{move from register to register} \\
\text{cmovle } rA, rB & 2 & 1 & \text{move if less or equal} \\
\text{cmovl } rA, rB & 2 & 2 & \text{move if less} \\
\text{cmove } rA, rB & 2 & 3 & \text{move if equal} \\
\text{cmovne } rA, rB & 2 & 4 & \text{move if not equal} \\
\text{cmovge } rA, rB & 2 & 5 & \text{move if greater or equal} \\
\text{cmovg } rA, rB & 2 & 6 & \text{move if greater} \\
\end{array}
\]
OPq rA,rB

Encompasses:

<table>
<thead>
<tr>
<th>Instruction</th>
<th>Opcode</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>addq rA,rB</td>
<td>6 0</td>
<td>add</td>
</tr>
<tr>
<td>subq rA,rB</td>
<td>6 1</td>
<td>subtract</td>
</tr>
<tr>
<td>andq rA,rB</td>
<td>6 2</td>
<td>and</td>
</tr>
</tbody>
</table>
| xorq rA,rB   | 6 3    | exclusive or
Y86 Instruction Set (4)

Encompasses:

<table>
<thead>
<tr>
<th>Instruction</th>
<th>Opcode</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>jmp Dest</td>
<td>7 0</td>
<td>unconditional jump</td>
</tr>
<tr>
<td>jle Dest</td>
<td>7 1</td>
<td>jump if less or equal</td>
</tr>
<tr>
<td>jl Dest</td>
<td>7 2</td>
<td>jump if less</td>
</tr>
<tr>
<td>je Dest</td>
<td>7 3</td>
<td>jump if equal</td>
</tr>
<tr>
<td>jne Dest</td>
<td>7 4</td>
<td>jump if not equal</td>
</tr>
<tr>
<td>jge Dest</td>
<td>7 5</td>
<td>jump if greater or equal</td>
</tr>
<tr>
<td>jg Dest</td>
<td>7 6</td>
<td>jump if greater</td>
</tr>
</tbody>
</table>
Simple Addressing Modes

- **Immediate**: value
  \[\text{irmovq } \$0xab, \%rbx\]

- **Register**: Reg\[R\]
  \[\text{rrmovq } \%rcx, \%rbx\]

- **Normal (R)**: Mem[Reg[\[R\]]]
  - Register R specifies memory address.
  - This is often called *indirect* addressing.
  \[\text{mrmovq } (%rcx), \%rax\]

- **Displacement D(R)**: Mem[Reg[\[R\]] + D]
  - Register R specifies start of memory region.
  - Constant displacement D specifies offset
  \[\text{mrmovq } 8(%rcb), \%rdx\]
Let’s write a fragment of Y86 assembly code. Our program swaps the 8-byte values starting in memory locations 0x0100 (value A) and 0x0200 (value B).

```
start:
  xorq   %rax, %rax
  mrmovq 0x100(%rax), %rbx
  mrmovq 0x200(%rax), %rcx
  rmmovq %rcx, 0x100(%rax)
  rmmovq %rbx, 0x200(%rax)
  halt
```

<table>
<thead>
<tr>
<th>Reg.</th>
<th>Use</th>
</tr>
</thead>
<tbody>
<tr>
<td>%rax</td>
<td>0</td>
</tr>
<tr>
<td>%rbx</td>
<td>A</td>
</tr>
<tr>
<td>%rcx</td>
<td>B</td>
</tr>
</tbody>
</table>
Now, we generate the machine code for our sample program. Assume that it is stored in memory starting at location 0x030. *I did this by hand, so check for errors!*

<table>
<thead>
<tr>
<th>Reg.</th>
<th>Use</th>
</tr>
</thead>
<tbody>
<tr>
<td>%rax</td>
<td>0</td>
</tr>
<tr>
<td>%rbx</td>
<td>A</td>
</tr>
<tr>
<td>%rcx</td>
<td>B</td>
</tr>
</tbody>
</table>

```
0x030: 6300  # xorq %rax, %rax
0x032: 503000010000000000000000 # mrmovq 0x100(%rax), %rbx
0x03c: 500100200000000000000000 # mrmovq 0x200(%rax), %rcx
0x046: 401000010000000000000000 # rmmovq %rcx, 0x100(%rax)
0x050: 403000020000000000000000 # rmmovq %rbx, 0x200(%rax)
0x05a: 00  # halt
```
Suppose we have the following simple C program in file code.c.

```c
int sumInts(long int n)
{
    /* Add the integers from 1..n. */
    long int i;
    long int sum = 0;
    for ( i = 1; i <= n; i++ ) {
        sum += i;
    }
    return sum;
}
```

(We used `long int` to force usage of the 64-bit registers.) You can compile it using the following commands:

```
> gcc -O -S code.c
```
.file "code.c"
.text
.globl sumInts
.type sumInts, @function
sumInts:
.LFB0:
  .cfi_startproc
  testq %rdi, %rdi
  jle .L4
  movq $0, %rax
  movq $1, %rdx
.L3:
  addq %rdx, %rax
  addq $1, %rdx
  cmpq %rdx, %rdi
  jge .L3
  ret
.L4:
  movq $0, %rax
  ret
  .cfi_endproc
.LFE0:
  .size sumInts, .−sumInts
  .ident "GCC: (Ubuntu 4.8.4−2ubuntu1~14.04) 4.8.4"
  .section .gnu.note.stack, "@progbits"
This is a hand translation into Y86 assembler:

```y86
sumInts:
  andq %rdi, %rdi          # test %rdi = n
  jle .L4                 # if <= 0, done
  irmovq $1, %rcx         # constant 1
  irmovq $0, %rax         # sum = 0
  irmovq $1, %rdx         # i = 1

.L3:
  rrmovq %rdi, %rsi        # temp = n
  addq %rdx, %rax          # sum += i
  addq %rcx, %rdx          # i += 1
  subq %rdx, %rsi          # temp -= i
  jge .L3                 # if >= 0, goto L3
  ret                     # else return sum

.L4:
  irmovq $0, %rax          # done
  ret
```

How does it get the argument? How does it return the value?
By convention, the first 6 parameters to any procedure are passed in order in 6 specific registers. Others are passed on the stack in reverse order.

**Registers: First 6 arguments**

- `%rdi`
- `%rsi`
- `%rdx`
- `%rcx`
- `%r8`
- `%r9`

Mnemonic to remember the order: “Diane’s silk dress cost $89.”

**Return value**

- `%rax`
Addition Instruction

- Add value in register rA to that in register rB.
  - Store result in register rB
  - Note that Y86 only allows addition to be applied to register data.
- E.g., `addq %rax, %rsi` is encoded as: 60 06. Why?
- Set condition codes based on the result.
- Two byte encoding:
  - First indicates instruction type.
  - Second gives source and destination registers.

What effects does `addq` have on the state?
You completely characterize an operation by saying how it changes the state.

What effects does `addq %rsi, %rdi` have on the state?
You completely characterize an operation by saying how it changes the state.

What effects does `addq %rsi, %rdi` have on the state?

- Set contents of `%rdi` to the sum of the current contents of `%rsi` and `%rdi`.
- Set condition codes based on the result of the sum.
  - OF: set iff the result causes an overflow
  - ZF: set iff the result is zero
  - SF: set iff the result is negative
- Increment the program counter by 2. Why 2?
- No effect on the memory or status flag.
### Arithmetic and Logical Operations

<table>
<thead>
<tr>
<th>Operation</th>
<th>Encoding</th>
<th>Function Code</th>
<th>Register A</th>
<th>Register B</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>Add</strong></td>
<td></td>
<td>6 0</td>
<td>rA</td>
<td>rB</td>
</tr>
<tr>
<td><strong>Subtract (rA from rB)</strong></td>
<td></td>
<td>6 1</td>
<td>rA</td>
<td>rB</td>
</tr>
<tr>
<td><strong>And</strong></td>
<td></td>
<td>6 2</td>
<td>rA</td>
<td>rB</td>
</tr>
<tr>
<td><strong>Exclusive Or</strong></td>
<td></td>
<td>6 3</td>
<td>rA</td>
<td>rB</td>
</tr>
</tbody>
</table>

- Refer to generically as “OPq”
- Encodings differ only by “function code”: lower-order 4-bits in first instruction byte.
- Set condition codes as side effect.
### Move Operations

<table>
<thead>
<tr>
<th>Operation</th>
<th>Format</th>
<th>mnemonic</th>
<th>Opcode</th>
<th>rA</th>
<th>rB</th>
</tr>
</thead>
<tbody>
<tr>
<td>Register to Register</td>
<td>2</td>
<td>rrmovq rA, rB</td>
<td>3</td>
<td>0</td>
<td>F</td>
</tr>
<tr>
<td>Immediate to Register</td>
<td>3</td>
<td>irmovq V, rB</td>
<td>4</td>
<td>0</td>
<td>F</td>
</tr>
<tr>
<td>Register to Memory</td>
<td>4</td>
<td>rmmovq rA, D(rB)</td>
<td>5</td>
<td>0</td>
<td>D</td>
</tr>
<tr>
<td>Memory to Register</td>
<td>5</td>
<td>mrmovq D(rB), rA</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

- Similar to the x86-64 movq instruction.
- Similar format for memory addresses.
- Slightly different names to distinguish them.
The Y86 adds special move instructions to compensate for the lack of certain *addressing modes*.
### Conditional Move Instructions

**Move (conditionally)**

<table>
<thead>
<tr>
<th>cmovXX rA, rB</th>
<th>2</th>
<th>fn</th>
<th>rA</th>
<th>rB</th>
</tr>
</thead>
</table>

- Refer to generically as “cmovXX”
- Encodings differ only by function code `fn`
- `rrmovq` instruction is a special case
- Based on values of condition codes
- Conditionally copy value from source to destination register
### Conditional Move Instructions

#### Move Unconditionally

<table>
<thead>
<tr>
<th>Instruction</th>
<th>Size</th>
<th>Flags</th>
<th>A</th>
<th>B</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>rrmovq rA, rB</code></td>
<td>2</td>
<td>0</td>
<td>rA</td>
<td>rB</td>
</tr>
</tbody>
</table>

#### Move when less or equal

<table>
<thead>
<tr>
<th>Instruction</th>
<th>Size</th>
<th>Flags</th>
<th>A</th>
<th>B</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>cmovle rA, rB</code></td>
<td>2</td>
<td>1</td>
<td>rA</td>
<td>rB</td>
</tr>
</tbody>
</table>

#### Move when less

<table>
<thead>
<tr>
<th>Instruction</th>
<th>Size</th>
<th>Flags</th>
<th>A</th>
<th>B</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>cmovl rA, rB</code></td>
<td>2</td>
<td>2</td>
<td>rA</td>
<td>rB</td>
</tr>
</tbody>
</table>

#### Move when equal

<table>
<thead>
<tr>
<th>Instruction</th>
<th>Size</th>
<th>Flags</th>
<th>A</th>
<th>B</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>cmove rA, rB</code></td>
<td>2</td>
<td>3</td>
<td>rA</td>
<td>rB</td>
</tr>
</tbody>
</table>

#### Move when not equal

<table>
<thead>
<tr>
<th>Instruction</th>
<th>Size</th>
<th>Flags</th>
<th>A</th>
<th>B</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>cmovne rA, rB</code></td>
<td>2</td>
<td>4</td>
<td>rA</td>
<td>rB</td>
</tr>
</tbody>
</table>

#### Move when greater or equal

<table>
<thead>
<tr>
<th>Instruction</th>
<th>Size</th>
<th>Flags</th>
<th>A</th>
<th>B</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>cmovge rA, rB</code></td>
<td>2</td>
<td>5</td>
<td>rA</td>
<td>rB</td>
</tr>
</tbody>
</table>

#### Move when greater

<table>
<thead>
<tr>
<th>Instruction</th>
<th>Size</th>
<th>Flags</th>
<th>A</th>
<th>B</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>cmovg rA, rB</code></td>
<td>2</td>
<td>6</td>
<td>rA</td>
<td>rB</td>
</tr>
</tbody>
</table>
Jump Instructions

### Jump (conditionally)

| jXX Dest | 7 | fn | Dest |

- Refer to generically as “jXX”
- Encodings differ only by function code fn
- Based on values of condition codes
- Same as x86-64 counterparts
- Encode full destination address (unlike PC-relative addressing in x86-64)
### Jump Instructions

<table>
<thead>
<tr>
<th>Jump Unconditionally</th>
<th></th>
<th>7</th>
<th>0</th>
<th>Dest</th>
</tr>
</thead>
<tbody>
<tr>
<td>jmp Dest</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Jump when less or equal</th>
<th></th>
<th>7</th>
<th>1</th>
<th>Dest</th>
</tr>
</thead>
<tbody>
<tr>
<td>jle Dest</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Jump when less</th>
<th></th>
<th>7</th>
<th>2</th>
<th>Dest</th>
</tr>
</thead>
<tbody>
<tr>
<td>jl Dest</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Jump when equal</th>
<th></th>
<th>7</th>
<th>3</th>
<th>Dest</th>
</tr>
</thead>
<tbody>
<tr>
<td>je Dest</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Jump when not equal</th>
<th></th>
<th>7</th>
<th>4</th>
<th>Dest</th>
</tr>
</thead>
<tbody>
<tr>
<td>jne Dest</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Jump when greater or equal</th>
<th></th>
<th>7</th>
<th>5</th>
<th>Dest</th>
</tr>
</thead>
<tbody>
<tr>
<td>jge Dest</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Jump when greater</th>
<th></th>
<th>7</th>
<th>6</th>
<th>Dest</th>
</tr>
</thead>
<tbody>
<tr>
<td>jg Dest</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>
Suppose you want to measure the number of elements in a null terminated list A with starting address in `%rdi`.

```assembly
len:
  irmovq $0, %rax  # result = 0
  mrmovq (%rdi), %rdx # val = *A
  andq %rdx, %rdx # Test val
  je Done # If 0, goto Done

Loop:
    ....

Done:
    ret
```
Region of memory holding program data.

Used in Y86 (and x86-64) for supporting procedure calls.

Stack top is indicated by `%rsp`, address of top stack element.

Stack grows toward lower addresses.

Top element is at lowest address in the stack.

When pushing, must first decrement stack pointer.

When popping, increment stack pointer.
Stack Operations

**Push**

```
pushq rA   a  0  rA  F
```
- Decrements `%rsp` by 8.
- Store quad word from `rA` to memory at `%rsp`.
- Similar to x86-64 `pushq` operation.

**Pop**

```
popq rA   b  0  rA  F
```
- Read quad word from memory at `%rsp`.
- Save in `rA`.
- Increment `%rsp` by 8.
- Similar to x86-64 `popq` operation.
Subroutine call

| call Dest | 8 | 0 | Dest |

- Push address of next instruction onto stack.
- Start executing instructions at Dest.
- Similar to x86-64 call instruction.

Subroutine return

| ret | 9 | 0 |

- Pop value from stack.
- Use as address for next instruction.
- Similar to x86-64 ret instruction.
Miscellaneous Instructions

No operation

| nop | 1  | 0 |

- Don’t do anything but advance PC.

Halt execution

| halt | 0  | 0 |

- Stop executing instructions; set status to HLT.
- x86-64 has a comparable instruction, but you can’t execute it in user mode.
- We will use it to stop the simulator.
- Encoding ensures that program hitting memory initialized to zero will halt.
Status Conditions

<table>
<thead>
<tr>
<th>Mnemonic</th>
<th>Code</th>
<th>Meaning</th>
</tr>
</thead>
<tbody>
<tr>
<td>AOK</td>
<td>1</td>
<td>Normal operation</td>
</tr>
<tr>
<td>HLT</td>
<td>2</td>
<td>Halt inst. encountered</td>
</tr>
<tr>
<td>ADR</td>
<td>3</td>
<td>Bad address (instr. or data)</td>
</tr>
<tr>
<td>INS</td>
<td>4</td>
<td>Invalid instruction</td>
</tr>
</tbody>
</table>

Desired behavior:

- If AOK, keep executing
- Otherwise, stop program execution
Try to use the C compiler as much as possible.

- Write code in C.
- Compile for x86-64 with gcc -Og -S.
- Transliterate into Y86 code.
- Modern compilers make this more difficult, because they optimize by default.

To understand Y86 (or x86) code, you have to know the meaning of the statement, but also certain *programming conventions*, especially the *stack discipline*.

- How do you pass arguments to a procedure?
- Where are local variables created?
- How does a procedure return a value?
- How do procedures save and restore the state of the caller?
Coding example: Find number of elements in a null-terminated list.

```c
long len1( long a[] );
```

The answer in this case should be 3.
First try writing typical array code:

```c
/* Count elements in null-terminated list */
long len1( long a[] )
{
    long len;
    for (len = 0; a[len]; len++)
        return len;
}
```

**Problem:** Hard to do array indexing on Y86, since we don’t have scaled addressing modes.

**x86 Code:**

```
L3:
    addq $1, %rax
    cmpq $0, (%rdi,%rax,8)
    jne L3
```

Compile with `gcc -Og -S`
Second try: Write C code that mimics expected Y86 code.

```c
/* Count elements in null-terminated list */
long len2(long *a)
{
    long ip = (long) a;
    long val = *(long *) ip;
    long len = 0;
    while (val) {
        ip += sizeof(long);
        len++;
        val = *(long *) ip;
    }
    return len;
}
```

Result:
- Compiler generates exact same code as before!
- Compiler converts both versions into the same intermediate form.
Y86-64 Code Generation Example (3)

```
len:
  irmovq $1, %r8 # Constant 1
  irmovq $8, %r9 # Constant 8
  irmovq $0, %rax # len = 0
  mrmovq (%rdi), %rdx # val = *a
  andq %rdx, %rdx # Test val
  je Done # If 0, goto Done

Loop:
  addq %r8, %rax # len++
  addq %r9, %rdi # a++
  mrmovq (%rdi), %rdx # val = *a
  andq %rdx, %rdx # Test val
  jne Loop # If !0, goto Loop

Done:
  ret
```

<table>
<thead>
<tr>
<th>Reg.</th>
<th>Use</th>
</tr>
</thead>
<tbody>
<tr>
<td>%rdi</td>
<td>a</td>
</tr>
<tr>
<td>%rax</td>
<td>len</td>
</tr>
<tr>
<td>%rdx</td>
<td>val</td>
</tr>
<tr>
<td>%r8</td>
<td>1</td>
</tr>
<tr>
<td>%r9</td>
<td>8</td>
</tr>
</tbody>
</table>

Reg. Use

%rdi a
%rax len
%rdx val
%r8 1
%r9 8
**Y86 Sample Program Structure**

```assembly
init:          # Initialization
  ...
  call Main
halt
.align 8      # Program data
Array:        # Main function
  ...
Main:
  ...
  call len
  ...
len:           # Length function
  ...
  .pos 0x100   # Place stack
Stack:
```

- Program starts at address 0
- Must set up stack
  - Where located
  - Pointer values
  - Mustn’t overwrite data
- Must initialize data
init:

# Set up stack pointer
 irmovq Stack, %rsp
# Execute main program
 call Main
# Terminate
 halt

# Array of 4 elements + final 0
.align 8
Array:
.quad 0x000d000d000d000d
.quad 0x00c000c000c000c0
.quad 0x0b000b000b000b00
.quad 0xa000a000a000a000
.quad 0

- Program starts at address 0
- Must set up stack
- Must initialize data
- Can use symbolic names
Main:

```
irmovq Array, %rdi
# call len(Array)
call len
ret
```

Set up call to len:

- Follow x86-64 procedure conventions
- Pass array address as argument
A program that translates Y86 code into machine language.

- 1-1 mapping of instructions to encodings.
- Resolves symbolic names.
- Translation is linear.
- Assembler directives give additional control.

Some common directives:

- `.pos x`: subsequent lines of code start at address $x$.
- `.align x`: align the next line to an $x$-byte boundary (e.g., long ints should be at a quadword address, divisible by 8).
- `.quad x`: put $x$ at the current address; a way to initialize a value.
Assembling Y86 Program

unix> yas len.ys

- Generates “object code” file len.yo
- Actually looks like disassembler output

<table>
<thead>
<tr>
<th>len</th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td>0x054</td>
<td>30f801000000000000000000</td>
</tr>
<tr>
<td>0x05e</td>
<td>30f908000000000000000000</td>
</tr>
<tr>
<td>0x068</td>
<td>30f000000000000000000000</td>
</tr>
<tr>
<td>0x072</td>
<td>5027000000000000000000000</td>
</tr>
<tr>
<td>0x07c</td>
<td>6222</td>
</tr>
<tr>
<td>0x07e</td>
<td>73a00000000000000000000</td>
</tr>
<tr>
<td>0x087</td>
<td>6080</td>
</tr>
<tr>
<td>0x089</td>
<td>6097</td>
</tr>
<tr>
<td>0x08b</td>
<td>5027000000000000000000000</td>
</tr>
<tr>
<td>0x095</td>
<td>6222</td>
</tr>
<tr>
<td>0x097</td>
<td>7487000000000000000000000</td>
</tr>
<tr>
<td>0x0a0</td>
<td>90</td>
</tr>
<tr>
<td></td>
<td></td>
</tr>
<tr>
<td>Loop</td>
<td></td>
</tr>
<tr>
<td>addq</td>
<td>$8, %r9</td>
</tr>
<tr>
<td>addq</td>
<td>$0, %rax</td>
</tr>
<tr>
<td>mrmovq</td>
<td>(%rdi), %rdx</td>
</tr>
<tr>
<td>andq</td>
<td>%rdx, %rdx</td>
</tr>
<tr>
<td></td>
<td>Done</td>
</tr>
<tr>
<td>Done:</td>
<td></td>
</tr>
<tr>
<td>jne</td>
<td>Loop</td>
</tr>
<tr>
<td>ret</td>
<td></td>
</tr>
</tbody>
</table>
unix> yis len.yo

Instruction set simulator
- Computes effect of each instruction on process state
- Prints changes in state from original

Stopped in 33 steps at PC = 0x13, Status 'HLT', CC Z=1
S=0 O=0

Changes to registers:
%rax: 0x0000000000000000 0x0000000000000004
%rsp: 0x0000000000000000 0x0000000000000100
%rdi: 0x0000000000000000 0x0000000000000038
%r8 : 0x0000000000000000 0x0000000000000001
%r9 : 0x0000000000000000 0x0000000000000008

Changes to memory:
0x00f0: 0x0000000000000000 0x0000000000000053
0x00f8: 0x0000000000000000 0x0000000000000013
CISC Instruction Sets

Complex Instruction Set Computer

- Dominant ISA style through the 80s.
- Lots of instructions:
  - Variable length
  - Stack as mechanism for supporting functions
  - Explicit push and pop instructions.
- ALU instructions can access memory.
  - E.g., `addq %rax, 12(%rbx, %rcx, 8)`
  - Requires memory read and write in one instruction execution.
  - Some ISAs had much more complex address calculations.
- Set condition codes as a side effect of other instructions.
- Basic philosophy:
  - Memory is expensive;
  - Instructions to support high-level language constructs.
Reduced Instruction Set Computer

- Originated in IBM Research; popularized in Berkeley and Stanford projects.
- Few, simple instructions.
  - Takes more instructions to execute a task, but faster and simpler implementation
  - Fixed length instructions for simpler decoding
- Register-oriented ISA
  - More registers (32 typically)
  - Stack is back-up for registers
- Only load and store instructions can access memory (mrmovq and rmmovq in Y86).
- Explicit test instructions set condition values in register.
- Philosophy: KISS
### MIPS Instruction Format

#### Register–register:

<table>
<thead>
<tr>
<th>Op</th>
<th>Ra</th>
<th>Rb</th>
<th>Rd</th>
<th>00000</th>
<th>Fn</th>
</tr>
</thead>
<tbody>
<tr>
<td>addu</td>
<td>$3,$2,$1</td>
<td># register add: $3 = $2+$1</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

#### Register–immediate:

<table>
<thead>
<tr>
<th>Op</th>
<th>Ra</th>
<th>Rb</th>
<th>Offset</th>
</tr>
</thead>
<tbody>
<tr>
<td>addu</td>
<td>$3,$2,3145</td>
<td># immediate add: $3 = $2+3145</td>
<td></td>
</tr>
<tr>
<td>sll</td>
<td>$3,$2,2</td>
<td># shift left: $3 = $2 &lt;&lt; 2</td>
<td></td>
</tr>
</tbody>
</table>

#### Branch:

<table>
<thead>
<tr>
<th>Op</th>
<th>Ra</th>
<th>Rb</th>
<th>Immediate</th>
</tr>
</thead>
<tbody>
<tr>
<td>beq</td>
<td>$3,$2,dest</td>
<td># Branch when $3 = $2</td>
<td></td>
</tr>
</tbody>
</table>

#### Load/Store:

<table>
<thead>
<tr>
<th>Op</th>
<th>Ra</th>
<th>Rb</th>
<th>Immediate</th>
</tr>
</thead>
<tbody>
<tr>
<td>lw</td>
<td>$3,16($2)</td>
<td># Load word: $3 = M[$2+16]</td>
<td></td>
</tr>
<tr>
<td>sw</td>
<td>$3,16($2)</td>
<td># Store word: M[$2+16] = $3</td>
<td></td>
</tr>
</tbody>
</table>
# MIPS Registers

<table>
<thead>
<tr>
<th>bf Name</th>
<th>Number</th>
<th>Use</th>
<th>Callee preserves?</th>
</tr>
</thead>
<tbody>
<tr>
<td>$zero</td>
<td>$0</td>
<td>constant 0</td>
<td>N/A</td>
</tr>
<tr>
<td>$at</td>
<td>$1</td>
<td>assembler temporary</td>
<td>No</td>
</tr>
<tr>
<td>$v0–$v1</td>
<td>$2–$3</td>
<td>function returns expression evaluation</td>
<td>No</td>
</tr>
<tr>
<td>$a0–$a3</td>
<td>$4–$7</td>
<td>function arguments</td>
<td>No</td>
</tr>
<tr>
<td>$t0–$t7</td>
<td>$8–$15</td>
<td>temporaries</td>
<td>No</td>
</tr>
<tr>
<td>$s0–$s7</td>
<td>$16–$23</td>
<td>saved temporaries</td>
<td>Yes</td>
</tr>
<tr>
<td>$t8–$t9</td>
<td>$24–$25</td>
<td>temporaries</td>
<td>No</td>
</tr>
<tr>
<td>$k0–$k1</td>
<td>$26–$27</td>
<td>reserved for OS kernel</td>
<td>N/A</td>
</tr>
<tr>
<td>$gp</td>
<td>$28</td>
<td>global pointer</td>
<td>Yes</td>
</tr>
<tr>
<td>$sp</td>
<td>$29</td>
<td>stack pointer</td>
<td>Yes</td>
</tr>
<tr>
<td>$fp</td>
<td>$30</td>
<td>frame pointer</td>
<td>Yes</td>
</tr>
<tr>
<td>$ra</td>
<td>$31</td>
<td>return address</td>
<td>N/A</td>
</tr>
</tbody>
</table>
CISC vs. RISC

Original Debate
- Strong opinions!
- CISC proponents—easy for compiler, fewer code bytes
- RISC proponents—better for optimizing compilers, can make run fast with simple chip design

Current Status
- For desktop processors, choice of ISA not a technical issue
  - With enough hardware, can make anything run fast
  - Code compatibility more important
- x86-64 adopted many RISC features
  - More registers; use them for argument passing
- For embedded processors, RISC makes sense
  - Smaller, cheaper, less power
  - Most cell phones use ARM processor
Y86-64 Instruction Set Architecture

- Similar state and instructions to x86-64
- Simpler encodings
- Somewhere between CISC and RISC

How Important is ISA Design?

- Less now than before: with enough hardware, can make almost anything run fast!