# Foundations: Synchronization Execution Abstractions

Chris Rossbach CS378

### Today

- Questions?
- Administrivia
  - Lab 1 due sooner than you'd like
- Foundations
  - Threads/Processes/Fibers
  - Cache coherence (maybe)
- Acknowledgments: some materials in this lecture borrowed from
  - Emmett Witchel (who borrowed them from: Kathryn McKinley, Ron Rockhold, Tom Anderson, John Carter, Mike Dahlin, Jim Kurose, Hank Levy, Harrick Vin, Thomas Narten, and Emery Berger)
  - Andy Tannenbaum



Faux Quiz (answer any 2, 5 min)

- What is the maximum possible speedup of a 75% parallelizable program on 8 CPUs
- What is super-linear speedup? List two ways in which super-linear speedup can occur.
- What is the difference between strong and weak scaling?
- Define Safety, Liveness, Bounded Waiting, Failure Atomicity
- What is the difference between processes and threads?
- What's a fiber? When and why might fibers be a better abstraction than threads?

#### Processes and Threads and Fibers...

- Abstractions
- Containers
- State
  - Where is shared state?
  - How is it accessed?
  - Is it mutable?







#### Processes and Threads and Fibers...



#### Programming and Machines: a mental model



struct machine\_state{
 uint64 pc;
 uint64 Registers[16];
 uint64 cr[6]; // control registers cr0-cr4 and EFER on AMD
...

```
} machine;
while(1) {
  fetch_instruction(machine.pc);
  decode_instruction(machine.pc);
  execute_instruction(machine.pc);
}
void execute_instruction(i) {
  switch(opcode) {
   case add_rr:
   machine.Registers[i.dst] += machine.Registers[i.src];
   break;
}
```

| prev instruct  | prev instruct  | prev instruct |   | I  |
|----------------|----------------|---------------|---|----|
| load A(1)      | call funcD     | do 10 i=1,N   |   |    |
| load B(1)      | x=y*z          | alpha=w**3    |   | ŧ  |
| C(1)=A(1)*B(1) | sum=x*2        | zeta=C(i)     |   | me |
| store C(1)     | call sub1(i,j) | 10 continue   |   |    |
| next instruct  | next instruct  | next instruct | Ľ | ,  |
| P1             | P2             | Pn            |   |    |

#### Parallel Machines: a mental model



| prev instruct  | prev instruct  | prev instruct |   | I  |
|----------------|----------------|---------------|---|----|
| load A(1)      | call funcD     | do 10 i=1,N   |   |    |
| load B(1)      | x=y*z          | alpha=w**3    |   | ŧ  |
| C(1)=A(1)*B(1) | sum=x*2        | zeta=C(i)     |   | me |
| store C(1)     | call sub1(i,j) | 10 continue   |   |    |
| next instruct  | next instruct  | next instruct | ľ | •  |
| P1             | P2             | Pn            |   |    |

#### Parallel Machines: a mental model



Processes, threads, fibers, events continuations, ... are all abstractions for this

#### Processes

#### Model



- Multiprogramming of four programs
- Conceptual model of 4 independent, sequential processes
- Only one program active at any instant

#### Processes

Model



- Multiprogramming of four programs
- Conceptual model of 4 independent, sequential processes
- Only one program active at any instant

#### Implementation

| Due en            | <b>N</b>                                           |                                         |
|-------------------------------------------------------|----------------------------------------------------|-----------------------------------------|
| Registers<br>Program counter                          | Pointer to text segment<br>Pointer to data segment | Root directory<br>Working directory     |
| Program status word<br>Stack pointer<br>Process state | Pointer to stack segment                           | File descriptors<br>User ID<br>Group ID |
| Priority<br>Scheduling parameters                     |                                                    |                                         |
| Process ID<br>Parent process                          |                                                    |                                         |
| Process group<br>Signals                              |                                                    |                                         |
| CPU time used                                         |                                                    |                                         |
| Time of next alarm                                    |                                                    |                                         |













(a) Three processes each with one thread



(a) Three processes each with one thread

(b) One process with three threads



(a) Three processes each with one thread

A red dog

on a blue tree.

(b) One process with three threads

A blue dog on a red tree.



(a) Three processes each with one thread

(b) One process with three threads



When might (a) be better than (b)? Vice versa?





(a) Three processes each with one thread

(b) One process with three threads



When might (a) be better than (b)? Vice versa? Could you do lab 1 with processes instead of threads?





(a) Three processes each with one thread

(b) One process with three threads



When might (a) be better than (b)? Vice versa?Could you do lab 1 with processes instead of threads?Threads simplify sharing and reduce context overheads



| Per process items           | Per thread items |  |
|-----------------------------|------------------|--|
| Address space               | Program counter  |  |
| Global variables            | Registers        |  |
| Open files                  | Stack            |  |
| Child processes             | State            |  |
| Pending alarms              |                  |  |
| Signals and signal handlers |                  |  |
| Accounting information      |                  |  |

| Per process items           | Per thread items |
|-----------------------------|------------------|
| Address space               | Program counter  |
| Global variables            | Registers        |
| Open files                  | Stack            |
| Child processes             | State            |
| Pending alarms              |                  |
| Signals and signal handlers |                  |
| Accounting information      |                  |

• Items shared by all threads in a process

| Per process items           | Per thread items |
|-----------------------------|------------------|
| Address space               | Program counter  |
| Global variables            | Registers        |
| Open files                  | Stack            |
| Child processes             | State            |
| Pending alarms              |                  |
| Signals and signal handlers |                  |
| Accounting information      |                  |

• Items shared by all threads in a process

Address space

Open files

**Global** variables

Child processes

Pending alarms



- Items shared by all threads in a process
- Items private to each thread

Open files



Address space

Open files

**Global** variables

Child processes

Pending alarms



Decouples memory and control abstractions!

Open files



- Decouples memory and control abstractions!
- What are the advantages of that?



### Using threads

Ex. How might we use threads in a word processor program?



#### Using threads

#### Ex. How might we use threads in a word processor program?



### Using threads

#### Ex. How might we use threads in a word processo



000

% CPU:

Threads:

Parent Process: launchd (1)

Process Group: Microsoft Word (446)

0.63

Memory

15

Microsoft Word (446)

Statistics

User: rossbach (501)

**Open Files and Ports** 

467

Recent hangs: 0

Page Ins:

#### Where to Implement Threads:

#### Where to Implement Threads:

User Space

Kernel Space

#### Where to Implement Threads:

#### **User Space**

Kernel Space


## Where to Implement Threads:

### **User Space**

Kernel Space



A user-level threads package

A threads package managed by the kernel

## Where to Implement Threads:



A user-level threads package

A threads package managed by the kernel

"Task" == "Flow of Control", but with less typing "Stack" == Task State



"Task" == "Flow of Control", but with less typing "Stack" == Task State



"Task" == "Flow of Control", but with less typing "Stack" == Task State

- Preemptive
  - Interleave on uniprocessor
  - Overlap on multiprocessor



"Task" == "Flow of Control", but with less typing "Stack" == Task State

- Preemptive
  - Interleave on uniprocessor
  - Overlap on multiprocessor
- Serial
  - One at a time, no conflict



"Task" == "Flow of Control", but with less typing "Stack" == Task State

- Preemptive
  - Interleave on uniprocessor
  - Overlap on multiprocessor
- Serial
  - One at a time, no conflict
- Cooperative
  - Yields at well-defined points
  - E.g. wait for long-running I/O



"Task" == "Flow of Control", but with less typing "Stack" == Task State

### Task Management

- Preemptive
  - Interleave on uniprocessor
  - Overlap on multiprocessor
- Serial
  - One at a time, no conflict
- Cooperative
  - Yields at well-defined points
  - E.g. wait for long-running I/O

### Stack Management

- Manual
  - Inherent in Cooperative
  - Changing at quiescent points
- Automatic
  - Inherent in pre-emptive
  - Downside: Hidden concurrency assumptions



"Task" == "Flow of Control", but with less typing "Stack" == Task State

### Task Management

- Preemptive
  - Interleave on uniprocessor
  - Overlap on multiprocessor
- Serial
  - One at a time, no conflict
- Cooperative
  - Yields at well-defined points
  - E.g. wait for long-running I/O

### Stack Management

- Manual
  - Inherent in Cooperative
  - Changing at quiescent points
- Automatic
  - Inherent in pre-emptive
  - Downside: Hidden concurrency assumptions

# These dimensions can be orthogonal



- Cooperative tasks
  - most desirable when reasoning about concurrency
  - usually associated with event-driven programming

- Cooperative tasks
  - most desirable when reasoning about concurrency
  - usually associated with event-driven programming
- Automatic stack management
  - most desirable when reading/maintaining code
  - Usually associated with threaded (or serial) programming

- Cooperative tasks
  - most desirable when reasoning about concurrency
  - usually associated with event-driven programming
- Automatic stack management
  - most desirable when reading/maintaining code
  - Usually associated with threaded (or serial) programming



- Cooperative tasks
  - most desirable when reasoning about concurrency
  - usually associated with event-driven programming
- Automatic stack management
  - most desirable when reading/maintaining code
  - Usually associated with threaded (or serial) programming

Fibers: cooperative threading with automatic stack management



• Like threads, just an abstraction for flow of control

- Like threads, just an abstraction for flow of control
- *Lighter weight* than threads
  - In Windows, just a stack, subset of arch. registers, non-preemptive
  - \*Not\* just threads without exception support
  - stack management/impl has interplay with exceptions
  - Can be completely exception safe

- Like threads, just an abstraction for flow of control
- *Lighter weight* than threads
  - In Windows, just a stack, subset of arch. registers, non-preemptive
  - \*Not\* just threads without exception support
  - stack management/impl has interplay with exceptions
  - Can be completely exception safe
- *Takeaway*: diversity of abstractions/containers for execution flows

### x86\_64 Architectural Registers



• Register map diagram courtesy of: By Immae - Own work, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=32745525

### switch\_to(x,y) should switch tasks from x to y.

\* This could still be optimized:

1\* \*

\*

- \* fold all the options into a flag word and test it with a single test
- \* could test fs/gs bitsliced

\* Kprobes not supported here. Set the probe on schedule inst \* Function graph tracer not supported too.

### \_\_visible \_\_notrace\_funcgraph struct task\_struct \* \_\_switch\_to(struct task\_struct \*prev\_p, struct task\_struct \*next\_p)

struct thread\_struct \*prev = &prev p->thread; struct thread\_struct \*next = &next\_p->thread; struct fpu \*prev\_fpu = &prev->fpu; struct fpu \*next\_fpu = &next->fpu; int cpu = smp\_processor\_id() struct tss\_struct \*tss = &per\_cpu(cpu\_tss\_rw, cpu);

### WARN\_ON\_ONCE(IS\_ENABLED(CONFIG\_DEBUG\_ENTRY) && this\_cpu\_read(irq\_count) != -1);

### switch\_fpu\_prepare(prev\_fpu, cpu);

/\* We must save %fs and %gs before load\_TLS() because \* %fs and %gs may be cleared by load\_TLS().

### \* (e.g. xen\_load\_tls())

### save\_fsgs(prev\_p);

\* Load TLS before restoring any segments so that segment loads \* reference the correct GDT entries. \*/

### load\_TLS(next, cpu);

\* Leave lazy mode, flushing any hypercalls made here. This \* must be done after loading TLS entries in the GDT but before \* loading segments that might reference them, and and it must \* be done before fpu\_\_restore(), so the TS bit is up to \* date. \*/

### arch\_end\_context\_switch(next\_p);

### /\* Switch DS and ES.

\* Reading them only returns the selectors, but writing them (if \* nonzero) loads the full descriptor from the GDT or LDT. The \* LDT for next is loaded in switch\_mm, and the GDT is loaded \* above. \*

\* We therefore need to write new values to the segment

\* registers on every context switch unless both the new and old \* values are zero.

\* Note that we don't need to do anything for CS and SS, as \* those are saved and restored as part of pt\_regs. \*/

### savesegment(es, prev->es); if (unlikely(next->es | prev->es)) loadsegment(es, next->es);

### savesegment(ds, prev->ds); if (unlikely(next->ds | prev->ds)) loadsegment(ds, next->ds);

load\_seg\_legacy(prev->fsindex, prev->fsbase, next->fsindex, next->fsbase, FS); load\_seg\_legacy(prev->gsindex, prev->gsbase, next->gsindex. next->gsbase. GS

### Linux x86\_64 context switch *excerpt*



|           |              | *               | * The AMD64 architecture provides 16 general 64-bit registers together with |                         |                     |                          |  |  |  |  |  |  |
|-----------|--------------|-----------------|-----------------------------------------------------------------------------|-------------------------|---------------------|--------------------------|--|--|--|--|--|--|
|           |              | *               | 128-bit SSE                                                                 | registers, overlappi    | ng with 8 legacy 80 | -bit x87 floating point  |  |  |  |  |  |  |
| ST(0) MMC | ) ST(1) MM   | 1 *             | registers.                                                                  |                         |                     |                          |  |  |  |  |  |  |
|           |              | -               |                                                                             | Both                    | Unix only           | Windows only             |  |  |  |  |  |  |
|           |              | 3 *             |                                                                             |                         |                     |                          |  |  |  |  |  |  |
|           |              | - *             | rax                                                                         | Result register         |                     |                          |  |  |  |  |  |  |
| 51(4)     |              | 5 *             | rbx                                                                         | Must be preserved       |                     |                          |  |  |  |  |  |  |
|           |              | 7.              | rcx                                                                         |                         | Fourth argument     | First argument           |  |  |  |  |  |  |
|           |              | <u>/</u> *      | rdx                                                                         | Charle and attack and a | Third argument      | Second argument          |  |  |  |  |  |  |
|           |              | *               | rsp                                                                         | Ename pointer, must     | be preserved        |                          |  |  |  |  |  |  |
|           |              | *               | rsi                                                                         | frame poincer, must     | Second argument     | Must he preserved        |  |  |  |  |  |  |
|           |              | <b>C</b> *      | rdi                                                                         |                         | First argument      | Must be preserved        |  |  |  |  |  |  |
|           |              | <u> </u>        | r8                                                                          |                         | Fifth argument      | Third argument           |  |  |  |  |  |  |
| S/M/      |              | *               | r9                                                                          |                         | Sixth argument      | Fourth argument          |  |  |  |  |  |  |
| 500       | _            | *               | r10-r11                                                                     | Volatile                |                     |                          |  |  |  |  |  |  |
|           | 8-bit regist | e *             | r12-r15                                                                     | Must be preserved       |                     |                          |  |  |  |  |  |  |
|           | 16-bit regi  | s <b>t</b> *    | ×mm0-5                                                                      | Volatile                |                     |                          |  |  |  |  |  |  |
|           | 10-bit regi  | эс <sub>*</sub> | xmm6-15                                                                     |                         | Volatile            | Must be preserved        |  |  |  |  |  |  |
| F_03      |              | *               | fpcsr                                                                       | Non volatile            |                     |                          |  |  |  |  |  |  |
|           |              | *               | mxcsr                                                                       | Non volatile            |                     |                          |  |  |  |  |  |  |
|           |              |                 | 71 6 11                                                                     |                         |                     |                          |  |  |  |  |  |  |
|           |              | *               | to proconvo                                                                 | two architectures w     | e get slightly diff | erent lists of registers |  |  |  |  |  |  |
|           |              | L .             | to preserve.                                                                |                         |                     |                          |  |  |  |  |  |  |
|           |              | *               | Registers "o                                                                | wned" by caller:        |                     |                          |  |  |  |  |  |  |
|           |              | *               | * Unix: rbx, rsp, rbp, r12-r15, mxcsr (control bits), x87 CW                |                         |                     |                          |  |  |  |  |  |  |
|           |              | *               | Windows:                                                                    | rbx, rsp, rbp, rsi,     | rdi, r12-r15, xmm6  | -15                      |  |  |  |  |  |  |
|           |              | *               |                                                                             |                         |                     |                          |  |  |  |  |  |  |
|           |              |                 |                                                                             |                         |                     | DD 4                     |  |  |  |  |  |  |

| it SSE   | registers, overlapp: | ing with 8 legacy 80 | 0-bit x87 float | ing point |     |         |     |       |          |      |   |
|----------|----------------------|----------------------|-----------------|-----------|-----|---------|-----|-------|----------|------|---|
| ters.    |                      |                      |                 |           |     | CR0     |     | CR4   |          |      |   |
|          | Both                 | Unix only            | Windows only    |           | 1   | CR1     | Ī   | CR5   | Ī        |      |   |
|          | Result register      |                      |                 |           | i   |         | ᆊ   | CR6   | ╡        |      |   |
|          | must be preserved    | Fourth argument      | First argume    | nt        |     |         |     | CINU  |          |      |   |
|          |                      | Third argument       | Second argum    | ent       |     | CR3     |     | CR7   |          |      |   |
|          | Stack pointer, must  | t be preserved       | 0               |           |     |         |     |       | 4        |      |   |
|          | Frame pointer, must  | t be preserved       |                 |           |     | CR3     |     | CR8   |          |      |   |
|          |                      | Second argument      | Must be pres    | erved     | 1   | MCINI   |     | 000   | Ħ.       |      |   |
|          |                      | Fifth argument       | Third argume    | nt        |     | MSW     |     | CR9   |          |      |   |
|          |                      | Sixth argument       | Fourth argum    | ent       |     |         | [   | CR10  | ٦        |      |   |
| r11      | Volatile             |                      |                 |           |     |         | ļ   | CIVIO |          |      |   |
| r15      | Must be preserved    |                      |                 |           | re  | egister |     | CR11  |          |      |   |
| -5       | Volatile             | V-1-+-11-            | Must be seen    |           | re  | gister  | ļ   | 01111 | 4        |      |   |
| -15<br>r | Non volatile         | Volatile             | Must be pres    | erved     |     | 5       |     | CR12  |          |      |   |
| r        | Non volatile         |                      |                 |           | 1   |         | ĥ   |       | =        |      |   |
|          |                      |                      |                 |           |     | DR6     |     | CR13  |          |      |   |
| for the  | two architectures w  | ve get slightly dif  | ferent lists of | registers |     |         | Ī   | CR1/  | ٦        |      |   |
| eserve.  |                      |                      |                 |           |     |         | ļ   | CIVIT | <u> </u> |      |   |
| ters "o  | wned" by caller:     |                      |                 |           |     | DR8     |     | CR15  | Ν        | 4XCS | R |
| :        | rbx, rsp, rbp, r12   | -r15, mxcsr (control | l bits), x87 CW |           | ╠   |         |     |       |          |      |   |
| ows:     | rbx, rsp, rbp, rsi   | , rdi, r12-r15, xmm0 | 5-15            |           |     | DR9     |     |       |          |      |   |
|          |                      |                      |                 | DR4       |     | DR10    |     | DR12  | D        | R14  | ] |
|          |                      |                      |                 | DDE       | ir. |         | F   | 2012  |          | D15  | í |
|          |                      |                      |                 |           |     | DKII    | I L | JKT3  | ΙD       | KT2  |   |

wn work, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=32745525

Rec

٠

### $switch_to(x,y)$ should switch tasks from x to y. fold all the options into a flag word and test it with a single test could test fs/gs bitsliced

probes not supported here. Set the probe on sci unction graph tracer out supported too.

visible \_\_notrace\_funcgraph struct task\_struct \*
\_\_witch\_te(struct task\_struct 'prev\_p, struct task\_struct 'next\_p

struct thread\_struct 'prev = &prev\_p->thread; struct thread\_struct 'next = &next\_p->thread; struct fpu 'prev\_fpu = &prev->fpu; struct fpu 'mext\_fpu = &next->fpu; snp\_processor\_id

ruct tss\_struct "tss = &per\_cpu(cpu\_tss\_rw, cpu) WARM\_ON\_ONCE(IS\_ENABLED(CONFIG\_DEBUG\_ENTRY) && this\_cpu\_read(irq\_count) != -1);

### switch\_fpu\_prepare(prev\_fpu, cpu);

We must save %fs and %gs before load\_TLS() beco %fs and %gs may be cleared by load\_TLS(). (e.g. xen\_load\_tls())

save fses(orey o):

d TLS before restoring any segments so that segment | ference the correct GDT entries. load\_TLS(next, cpu);

arch\_end\_context\_switch(next\_p) \* Reading them only returns the selectors, but writing them (1 \* nonzero) loads the full descriptor from the GDT or LDT. The \* LDT for next is loaded in switch\_nm, and the GDT is loaded \* above. \* We therefore need to write new values to the segment \* registers on every context switch unless both the new and a \* values are zero. savese 1f (un

| te  | that | we do  | n't n | eed to | o do a         | nythir |
|-----|------|--------|-------|--------|----------------|--------|
| 026 | ore  | sanea  | and   | restor | ied as         | port   |
|     |      |        |       |        |                |        |
| -   |      |        |       | •/•    |                |        |
| nu  | кетл | uexc-  | -es   | prev   | >es))          |        |
|     | load | regnen | t(es, | next-  | > <b>es</b> ); |        |
| egn | ent( | is, pr | ev->d | s);    |                |        |
| nli | kely | (next- | ds    | prev-  | > <b>ds</b> )) |        |

load\_seg\_legacy(prev->gsindex, prev->gsbase, next->gsindex, next->gsbase, d5)

### x86 64 Registers and Threads



Register map diagram courtesy of: By Immae - Own work, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=32745525 •

/\* switch\_te(x,y) should switch tasks from x to y.
\* This could still be optimized:
\* fold all the optimized:
\* fold all the optimized sword and test it with a single test
\* could rest fygs bitsIted

\* Kprobes not supported here. Set the probe on schedule instead.
\* Function graph tracer not supported too.
\*/

visible \_\_notrace\_funcgraph struct task\_struct \*
\_\_witch\_te(struct task\_struct 'prev\_p, struct task\_struct 'next\_p)

struct thread\_struct "prev = &prev\_p->thread; struct thread\_struct "next = &next\_p->thread; struct fpu "next\_pore->fpu; struct fpu "next\_pore = &next->fpu; int cop = = m\_pprecessor\_id();

struct tes\_struct "tss = [per\_cpu(cpu\_tss\_rw, cpu); wath\_ow\_owce(ts\_blabLeD(CONFIG\_DEBUG\_ENTRY) && thts\_cpu\_read(ira\_count) [= -1);

switch\_fpu\_prepare(prev\_fpu, cpu);

/\* We must save %fs and %gs before load\_TLS() because \* %fs and %gs may be cleared by load\_TLS(). \* (e.g. xen\_load\_tLS()) \* (e.g. xen\_load\_tLS())

save\_fsgs(prev\_p);

/\* Load TLS before restoring any segments so that segment \* reference the correct CDT entries. \*/ load\_TLS(next, cpu);

identification (0);
/\* const large may hypercells make to
\* const large may hypercells make to
\* const large may hypercells make to
\* const large may constrain that shift reference then, and
\* const large may large la



• Register map diagram courtesy of: By Immae - Own work, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=32745525

# x86\_64 Registers and Threads

| x86_64 Registers and Fibers |      |        |       |       |       |        |       |        |       |         | <ul> <li>res</li> <li>rep</li> <li>rep</li> <li>refu</li> <li>re</li></ul> | Fourt<br>Thir<br>Stack polater, must be p<br>freme polater, must be p<br>freme polater, must be<br>rife<br>visit<br>Volatile<br>Non volatile<br>Non volatile<br>Non volatile<br>Non volatile<br>Non volatile<br>Non volatile<br>to possible<br>to p | argument Fir-<br>argument Secc<br>served argument Must<br>argument Must<br>argument Four<br>le Must<br>lightly different<br>csr (control bits)<br>12-r15, xmm0-13 | it argument<br>ond argument<br>t be preserved<br>t be preserved<br>of argument<br>th argument<br>t be preserved<br>lists of registers<br>), x87 CM |               |              |          |          |      |       |  |  |
|-----------------------------|------|--------|-------|-------|-------|--------|-------|--------|-------|---------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------|---------------|--------------|----------|----------|------|-------|--|--|
| ZMM0                        | ΥM   | IMO    | хммо  | ZMM1  | ۲M    | 1M1 [  | XMM1  | ST(0)  | MM0   | ST(1)   | MM1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              | AL                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        | анАХЕА                                                                                                                                                            | X RAX                                                                                                                                              | R8B R8W R8D   | R8 R128R12V  | R12D R12 | CR0      | CR4  |       |  |  |
| ZMM2                        | ΥM   | IM2    | XMM2  | ZMM3  | ۲M    | 1M3 [  | ХММЗ  | ST(2)  | MM2   | ST(3)   | MM3                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              | BL                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        | внВХЕВ                                                                                                                                                            | <b>X</b> RBX                                                                                                                                       | R9B R9W R9D   | R9 R138R13V  | R13DR13  | CR1      | CR5  |       |  |  |
| ZMM4                        | YM   | IM4 🛛  | XMM4  | ZMM5  | ۲M    | 1M5 🛛  | XMM5  | ST(4)  | MM4   | ST(5)   | MM5                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              | CL                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        | снСХЕС                                                                                                                                                            | X RCX                                                                                                                                              | R10BR10W R10D | R10 R14BR14V | R14D R14 | CR2      | CR6  | ]     |  |  |
| ZMM6                        | ۲M   | IM6    | XMM6  | ZMM7  | ۲M    | 1M7 🛛  | XMM7  | ST(6)  | MM6   | ST(7)   | MM7                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              | DL                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        | DHDXED                                                                                                                                                            | <b>X</b> RDX                                                                                                                                       | R11BR11W R11D | R11 R158R15V | R15DR15  | CR3      | CR7  | ]     |  |  |
| ZMM8                        | YM   | IM8    | XMM8  | ZMM9  | ۲M    | 1M9 🛛  | XMM9  |        |       |         |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  | BPI                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       | BPEBP                                                                                                                                                             | RBP [                                                                                                                                              |               |              | EIP RIP  | CR3      | CR8  | ]     |  |  |
| ZMM10                       | YM   | IM10 🛛 | XMM10 | ZMM1  | 1 YM  | 1M11 🛛 | XMM11 | CW     | FP_IP | FP_DP   | FP_CS                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            | SI                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        | SI ESI                                                                                                                                                            | RSI                                                                                                                                                | SPL SPESPR    | SP           |          | MSW      | CR9  | ]     |  |  |
| ZMM12                       | YM   | IM12   | XMM12 | ZMM1  | 3 YM  | 1M13 🛛 | XMM13 | SW     |       |         |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |                                                                                                                                                                   |                                                                                                                                                    |               |              |          |          | CR10 | ]     |  |  |
| ZMM14                       | YM   | IM14 🛛 | XMM14 | ZMM1  | 5 YM  | 1M15 [ | XMM15 | TW     |       | 8-bit r | egister<br>rogistor                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           | 32-bit r                                                                                                                                                          | egister                                                                                                                                            | 80-bit        | register     | 256-bit  | register | CR11 | ]     |  |  |
| ZMM16 Z                     | MM17 | ZMM18  | ZMM19 | ZMM20 | ZMM21 | ZMM22  | ZMM23 | FP_DS  |       | 10-010  | register                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           | 04-0101                                                                                                                                                           | egistei                                                                                                                                            | 120-01        |              | JIZ-DIC  | register | CR12 | ]     |  |  |
| ZMM24 Z                     | MM25 | ZMM26  | ZMM27 | ZMM28 | ZMM29 | ZMM30  | ZMM31 | FP_OPC | FP_DP | FP_IP   | C                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                | S                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         | SS                                                                                                                                                                | DS                                                                                                                                                 | GDTR          | IDTR         | DR0      | DR6      | CR13 | ]     |  |  |
|                             |      |        |       |       |       |        |       |        |       |         | E                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                | S                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         | FS                                                                                                                                                                | GS                                                                                                                                                 | TR            | LDTR         | DR1      | DR7      | CR14 | ]     |  |  |
|                             |      |        |       |       |       |        |       |        |       |         |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |                                                                                                                                                                   |                                                                                                                                                    | FLAGS EFLAGS  | RELAGS       | DR2      | DR8      | CR15 | MXCSR |  |  |
|                             |      |        |       |       |       |        |       |        |       |         |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |                                                                                                                                                                   |                                                                                                                                                    |               |              | DR3      | DR9      |      |       |  |  |
|                             |      |        |       |       |       |        |       |        |       |         |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |                                                                                                                                                                   |                                                                                                                                                    |               |              | DR4      | DR10     | DR12 | DR14  |  |  |
|                             |      |        |       |       |       |        |       |        |       |         |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |                                                                                                                                                                   |                                                                                                                                                    |               |              | DR5      | DR11     | DR13 | DR15  |  |  |

\* The AMD64 architecture provides 16 general 64-bit registers together with 16 \* 128-bit SSE registers, overlapping with 8 legacy 80-bit x87 floating point

Unix only

Windows only

registers.

\* rax \* rbx \* rcx Both

Result register Must be preserved

• Register map diagram courtesy of: By Immae - Own work, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=32745525



• Register map diagram courtesy of: By Immae - Own work, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=32745525



<sup>•</sup> Register map diagram courtesy of: By Immae - Own work, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=32745525

### Pthreads

- POSIX standard thread model,
- Specifies the API and call semantics.
- Popular most thread libraries are Pthreads-compatible

## Preliminaries

- Include pthread.h in the main file
- Compile program with -lpthread
  - gcc -o test test.c -lpthread
  - may not report compilation errors otherwise but calls will fail
- Good idea to check return values on common functions

• Types: pthread\_t - type of a thread

- Types: pthread\_t type of a thread
- Some calls:

- Types: pthread\_t type of a thread
- Some calls:

- No explicit parent/child model, except main thread holds process info
- Call pthread\_exit in main, don't just fall through;

- Types: pthread\_t type of a thread
- Some calls:

- No explicit parent/child model, except main thread holds process info
- Call pthread\_exit in main, don't just fall through;
- When do you need pthread\_join ?

- Types: pthread\_t type of a thread
- Some calls:

- No explicit parent/child model, except main thread holds process info
- Call pthread\_exit in main, don't just fall through;
- When do you need pthread\_join ?
  - status = exit value returned by joinable thread

- Types: pthread\_t type of a thread
- Some calls:

- No explicit parent/child model, except main thread holds process info
- Call pthread\_exit in main, don't just fall through;
- When do you need pthread\_join ?
  - status = exit value returned by joinable thread
- Detached threads are those which cannot be joined (can also set this at creation)

### Creating multiple threads

```
#include <stdio.h>
#include <pthread.h>
#define NUM THREADS 4
void *hello (void *arg) {
      printf("Hello Thread\n");
main() {
  pthread t tid[NUM THREADS];
  for (int i = 0; i < NUM THREADS; i++)
    pthread create(&tid[i], NULL, hello, NULL);
  for (int i = 0; i < NUM THREADS; i++)</pre>
    pthread_join(tid[i], NULL);
```

## Can you find the bug here?

### What is printed for myNum?

```
void *threadFunc(void *pArg) {
    int* p = (int*)pArg;
    int myNum = *p;
    printf( "Thread number %d\n", myNum);
}
. . .
// from main():
for (int i = 0; i < numThreads; i++) {
    pthread_create(&tid[i], NULL, threadFunc, &i);
}</pre>
```
• Type: pthread\_mutex\_t

- Type: pthread\_mutex\_t
- int pthread\_mutex\_init(pthread\_mutex\_t \*mutex,

• Type: pthread\_mutex\_t

- Type: pthread\_mutex\_t

- Type: pthread\_mutex\_t

- Type: pthread\_mutex\_t

• Type: pthread\_mutex\_t

- Type: pthread\_mutex\_t
- Attributes: for shared mutexes/condition vars among processes, for priority inheritance, etc.
  - use defaults

- Type: pthread\_mutex\_t
- Attributes: for shared mutexes/condition vars among processes, for priority inheritance, etc.
  - use defaults
- Important: Mutex scope must be visible to all threads!

• Type: pthread\_spinlock\_t

- Type: pthread\_spinlock\_t
- int pthread\_spinlock\_init(pthread\_spinlock\_t \*lock);

- Type: pthread\_spinlock\_t
- int pthread\_spinlock\_init(pthread\_spinlock\_t \*lock); int pthread\_spinlock\_destroy(pthread\_spinlock\_t \*lock);

- Type: pthread\_spinlock\_t
- int pthread\_spinlock\_init(pthread\_spinlock\_t \*lock); int pthread\_spinlock\_destroy(pthread\_spinlock\_t \*lock); int pthread\_spin\_lock(pthread\_spinlock\_t \*lock);

- Type: pthread\_spinlock\_t
- int pthread\_spinlock\_init(pthread\_spinlock\_t \*lock);
- int pthread\_spinlock\_destroy(pthread\_spinlock\_t \*lock);
- int pthread\_spin\_lock(pthread\_spinlock\_t \*lock);
- int pthread\_spin\_unlock(pthread\_spinlock\_t \*lock);

- Type: pthread\_spinlock\_t
- int pthread\_spinlock\_init(pthread\_spinlock\_t \*lock); int pthread\_spinlock\_destroy(pthread\_spinlock\_t \*lock); int pthread\_spin\_lock(pthread\_spinlock\_t \*lock); int pthread\_spin\_unlock(pthread\_spinlock\_t \*lock); int pthread\_spin\_trylock(pthread\_spinlock\_t \*lock);

- Type: pthread\_spinlock\_t
- int pthread\_spinlock\_init(pthread\_spinlock\_t \*lock); int pthread\_spinlock\_destroy(pthread\_spinlock\_t \*lock); int pthread\_spin\_lock(pthread\_spinlock\_t \*lock); int pthread\_spin\_unlock(pthread\_spinlock\_t \*lock); int pthread\_spin\_trylock(pthread\_spinlock\_t \*lock);

Wait...what's the difference?

int pthread\_mutex\_init(pthread\_mutex\_t \*mutex,...); int pthread\_mutex\_destroy(pthread\_mutex\_t \*mutex); int pthread\_mutex\_lock(pthread\_mutex\_t \*mutex); int pthread\_mutex\_unlock(pthread\_mutex\_t \*mutex); int pthread\_mutex\_trylock(pthread\_mutex\_t \*mutex);

- Safety
  - Only one thread in the critical region

- Safety
  - Only one thread in the critical region
- Liveness
  - Some thread that enters the entry section eventually enters the critical region
  - Even if other thread takes forever in non-critical region

- Safety
  - Only one thread in the critical region
- Liveness
  - Some thread that enters the entry section eventually enters the critical region
  - Even if other thread takes forever in non-critical region
- Bounded waiting
  - A thread that enters the entry section enters the critical section within some bounded number of operations.

- Safety
  - Only one thread in the critical region
- Liveness
  - Some thread that enters the entry section eventually enters the critical region
  - Even if other thread takes forever in non-critical region
- Bounded waiting
  - A thread that enters the entry section enters the critical section within some bounded number of operations.
  - If a thread i is in entry section, then there is a bound on the number of times that other threads are allowed to enter the critical section before thread i's request is granted

- Safety
  - Only one thread in the critical region
- Liveness
  - Some thread that enters the entry section eventually enters the critical region
  - Even if other thread takes forever in non-critical region
- Bounded waiting
  - A thread that enters the entry section enters the critical section within some bounded number of operations.
  - If a thread i is in entry section, then there is a bound on the number of times that other threads are allowed to enter the critical section before thread i's request is granted

Theorem: Every property is a combination of a safety property and a liveness property. -Bowen Alpern & Fred Schneider https://www.cs.cornell.edu/fbs/publications/defliveness.pdf

- Safety
  - Only one thread in the critical region
- Liveness
  - Some thread that enters the entry section eventually enters the critical region
  - Even if other thread takes forever in non-critical region
- Bounded waiting
  - A thread that enters the entry section enters the critical section within some bounded number of operations.
  - If a thread i is in entry section, then there is a bound on the number of times that other threads are allowed to enter the critical section before thread i's request is granted
     while (1)

Theorem: Every property is a combination of a safety property and a liveness property. -Bowen Alpern & Fred Schneider https://www.cs.cornell.edu/fbs/publications/defliveness.pdf

Entry section

Exit section

Critical section

Non-critical section



- Safety
  - Only one thread in the critical region
- Liveness
  - Some thread that enters the entry section eventually enters the critical region
  - Even if other thread takes forever in non-critical region
- Bounded waiting
  - A thread that enters the entry section enters the critical section within some bounded number of operations.
  - If a thread i is in entry section, then there is a bound on the number of times that other threads are allowed to enter the critical section before thread i's request is granted
     while (1)

Mutex, spinlock, etc. are ways to implement

Did we get all the important conditions? Why is correctness defined in terms of locks? Theorem: Every property is a combination of a safety property and a liveness property. -Bowen Alpern & Fred Schneider https://www.cs.cornell.edu/fbs/publications/defliveness.pdf

Entry section

Exit section

Critical section

Non-critical section

int lock\_value = 0; int\* lock = &lock\_value;

int lock\_value = 0; int\* lock = &lock\_value;

```
Lock::Acquire() {
while (*lock == 1)
; //spin
*lock = 1;
}
```

int lock\_value = 0; int\* lock = &lock\_value;

Lock::Acquire() { while (\*lock == 1) ; //spin \*lock = 1; }

Lock::Release() {
 \*lock = 0;
}

int lock\_value = 0; int\* lock = &lock\_value;

```
Lock::Acquire() {
while (*lock == 1)
; //spin
*lock = 1;
}
```

Lock::Release() {
 \*lock = 0;
}

#### What are the problem(s) with this?

- ➤ A. CPU usage
- ➢ B. Memory usage
- C. Lock::Acquire() latency
- D. Memory bus usage
- E. Does not work

int lock\_value = 0; int\* lock = &lock\_value;

```
Lock::Acquire() {
while (*lock == 1)
; //spin
*lock = 1;
}
```

Completely and utterly broken. How can we fix it?

Lock::Release() {
 \*lock = 0;
}

#### What are the problem(s) with this?

- ➤ A. CPU usage
- ➢ B. Memory usage
- C. Lock::Acquire() latency
- D. Memory bus usage
- E. Does not work

```
IDEA: hardware
implements
something like:
bool rmw(addr, value) {
atomic {
tmp = *addr;
newval = modify(tmp);
*addr = newval;
}
```

IDEA: hardware implements something like: bool rmw(addr, value) { atomic { tmp = \*addr; newval = modify(tmp); \*addr = newval;

> Why is that hard? How can we do it?

}

IDEA: hardware
 implements
 something like:
bool rmw(addr, value) {
 atomic {
 tmp = \*addr;
 newval = modify(tmp);
 \*addr = newval;
 }
}
Why is that hard?

How can we do it?

Preview of Techniques:

IDEA: hardware implements something like:

```
bool rmw(addr, value) {
   atomic {
    tmp = *addr;
    newval = modify(tmp);
    *addr = newval;
   }
}
```

Why is that hard? How can we do it? Preview of Techniques:

• Bus locking
# HW Support for Read-Modify-Write (RMW)

IDEA: hardware implements something like:

```
bool rmw(addr, value) {
   atomic {
    tmp = *addr;
    newval = modify(tmp);
    *addr = newval;
   }
}
```

Why is that hard? How can we do it? Preview of Techniques:

- Bus locking
- Single Instruction ISA extensions
  - Test&Set
  - CAS: Compare & swap
  - Exchange, locked increment, locked decrement (x86)

# HW Support for Read-Modify-Write (RMW)

IDEA: hardware implements something like:

```
bool rmw(addr, value) {
   atomic {
    tmp = *addr;
    newval = modify(tmp);
    *addr = newval;
   }
}
```

Why is that hard? How can we do it? Preview of Techniques:

- Bus locking
- Single Instruction ISA extensions
  - Test&Set
  - CAS: Compare & swap
  - Exchange, locked increment, locked decrement (x86)
- Multi-instruction ISA extensions:
  - LLSC: (PowerPC, Alpha, MIPS)
  - Transactional Memory (x86, PowerPC)

# HW Support for Read-Modify-Write (RMW)

IDEA: hardware implements something like:

```
bool rmw(addr, value) {
   atomic {
     tmp = *addr;
     newval = modify(tmp);
     *addr = newval;
   }
}
```

Why is that hard? How can we do it? Preview of Techniques:

- Bus locking
- Single Instruction ISA extensions
  - Test&Set
  - CAS: Compare & swap
  - Exchange, locked increment, locked decrement (x86)
- Multi-instruction ISA extensions:
  - LLSC: (PowerPC, Alpha, MIPS)
  - Transactional Memory (x86, PowerPC)

int lock\_value = 0; int\* lock = &lock\_value;

int lock\_value = 0; int\* lock = &lock\_value;

Lock::Acquire() { while (test&set(lock) == 1) ; //spin }



#### (test & set ~= CAS ~= LLSC) TST: *Test&set*

- Reads a value from memory
- Write "1" back to memory location

int lock\_value = 0; int\* lock = &lock\_value;

Lock::Acquire() { while (test&set(lock) == 1) ; //spin }



#### (test & set ~= CAS ~= LLSC) TST: *Test&set*

- Reads a value from memory
- Write "1" back to memory location

Lock::Release() { \*lock = 0; }

int lock\_value = 0; int\* lock = &lock\_value;

Lock::Acquire() { while (test&set(lock) == 1) ; //spin }



#### (test & set ~= CAS ~= LLSC) TST: *Test&set*

- Reads a value from memory
- Write "1" back to memory location

Lock::Release() {
 \*lock = 0;
}

#### What are the problem(s) with this?

- ➤ A. CPU usage
- ➢ B. Memory usage
- C. Lock::Acquire() latency
- D. Memory bus usage
- ➢ E. Does not work

int lock\_value = 0; int\* lock = &lock\_value;

Lock::Acquire() { while (test&set(lock) == 1) ; //spin }



#### (test & set ~= CAS ~= LLSC) TST: *Test&set*

- Reads a value from memory
- Write "1" back to memory location

Lock::Release() {
 \*lock = 0;
}

#### What are the problem(s) with this?

- ➤ A. CPU usage
- B. Memory usage
- C. Lock::Acquire() latency
- D. Memory bus usage
- E. Does not work

#### More on this later...

F = ma

# PhysicsConcurrencyF = ma~ coherence





• P1: read X



• P1: read X



- P1: read X
- P2: read X



- P1: read X
- P2: read X



- P1: read X
- P2: read X
- P2: X++



- P1: read X
- P2: read X
- P2: X++



- P1: read X
- P2: read X
- P2: X++
- P3: read X



- P1: read X
- P2: read X
- P2: X++
- P3: read X







BusRd(S)



BusRd(S)

Each cache line has a state (M, E, S, I)

• Processors "snoop" bus to maintain states



INVALID

- Processors "snoop" bus to maintain states
- Initially  $\rightarrow$  'I'  $\rightarrow$  Invalid



INVALID

- Processors "snoop" bus to maintain states
- Initially  $\rightarrow$  'l'  $\rightarrow$  Invalid
- Read one  $\rightarrow$  'E'  $\rightarrow$  exclusive



INVALID

- Processors "snoop" bus to maintain states
- Initially  $\rightarrow$  'l'  $\rightarrow$  Invalid
- Read one  $\rightarrow$  'E'  $\rightarrow$  exclusive
- Reads  $\rightarrow$  'S'  $\rightarrow$  multiple copies possible



BusRd(S

INVALID

- Processors "snoop" bus to maintain states
- Initially  $\rightarrow$  'l'  $\rightarrow$  Invalid
- Read one  $\rightarrow$  'E'  $\rightarrow$  exclusive
- Reads  $\rightarrow$  'S'  $\rightarrow$  multiple copies possible
- Write  $\rightarrow$  'M'  $\rightarrow$  single copy  $\rightarrow$  lots of cache coherence traffic



BusRd(S

INVALID

- Processors "snoop" bus to maintain states
- Initially  $\rightarrow$  'l'  $\rightarrow$  Invalid
- Read one  $\rightarrow$  'E'  $\rightarrow$  exclusive
- Reads  $\rightarrow$  'S'  $\rightarrow$  multiple copies possible
- Write  $\rightarrow$  'M'  $\rightarrow$  single copy  $\rightarrow$  lots of cache coherence traffic


























































# Read-Modify-Write (RMW)

- Implementing locks requires read-modify-write operations
- Required effect is:
  - An atomic and isolated action
    - 1. read memory location AND
    - 2. write a new value to the location
  - RMW is *very tricky* in multi-processors
  - Cache coherence alone doesn't solve it



### Essence of HW-supported RMW



# HW Support for Read-Modify-Write (RMW)

| Test & Set                                                                                                                        | CAS                                                                                                                                                  | Exchange, locked<br>increment/decrement,                                                                 | LLSC: load-linked store-conditional                                                                                                              |
|-----------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------|
| Most architectures                                                                                                                | Many architectures                                                                                                                                   | x86                                                                                                      | PPC, Alpha, MIPS                                                                                                                                 |
| <pre>int TST(addr) {     atomic {         ret = *addr;         if(!*addr)            *addr = 1;         return ret;     } }</pre> | <pre>bool cas(addr, old, new) {    atomic {      if(*addr == old) {         *addr = new;         return true;      }      return false;    } }</pre> | <pre>int XCHG(addr, val) {    atomic {      ret = *addr;      *addr = val;      return ret;    } }</pre> | <pre>bool LLSC(addr, val) {   ret = *addr;   atomic {     if(*addr == ret) {       *addr = val;       return true;     }   return false; }</pre> |

# HW Support for Read-Modify-Write (RMW)

| Test & Set                                                                                                                         | CAS                                                                                                                                                | Exchange, locked<br>increment/decrement,                                                                            | LLSC: load-linked store-conditional                                                                                                                  |
|------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------|
| Most architectures                                                                                                                 | Many architectures                                                                                                                                 | x86                                                                                                                 | PPC, Alpha, MIPS                                                                                                                                     |
| <pre>int TST(addr) {     atomic {         ret = *addr;         if(!*addr)             *addr = 1;         return ret;     } }</pre> | <pre>bool cas(addr, old, new) {    atomic {      if(*addr == old) {         *addr = new;         return true;      }      return false;    }</pre> | <pre>int XCHG(addr, val) {     atomic {         ret = *addr;         *addr = val;         return ret;     } }</pre> | <pre>bool LLSC(addr, val) {   ret = *addr;   atomic {     if(*addr == ret) {         *addr = val;         return true;     }     return false:</pre> |
| ,                                                                                                                                  | }                                                                                                                                                  |                                                                                                                     | }                                                                                                                                                    |

```
void CAS_lock(lock) {
   while(CAS(&lock, 0, 1) != true);
}
```

# HW Support for Read-Modify-Write (RMW)

| Test & Set                                                                                                                        | CAS                                                                                                                                                  | Exchange, locked<br>increment/decrement,                                                                 | LLSC: load-linked store-conditional                                                                                                              |
|-----------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------|
| Most architectures                                                                                                                | Many architectures                                                                                                                                   | x86                                                                                                      | PPC, Alpha, MIPS                                                                                                                                 |
| <pre>int TST(addr) {     atomic {         ret = *addr;         if(!*addr)            *addr = 1;         return ret;     } }</pre> | <pre>bool cas(addr, old, new) {    atomic {      if(*addr == old) {         *addr = new;         return true;      }      return false;    } }</pre> | <pre>int XCHG(addr, val) {    atomic {      ret = *addr;      *addr = val;      return ret;    } }</pre> | <pre>bool LLSC(addr, val) {   ret = *addr;   atomic {     if(*addr == ret) {       *addr = val;       return true;     }   return false; }</pre> |

### HW Support for RMW: LL-SC

```
LLSC: load-linked store-conditional
```

```
PPC, Alpha, MIPS
bool LLSC(addr, val) {
  ret = *addr;
  atomic {
    if(*addr == ret) {
      *addr = val;
      return true;
    }
  return false;
}
```

- load-linked is a load that is "linked" to a subsequent store-conditional
- Store-conditional only succeeds if value from linked-load is unchanged

### HW Support for RMW: LL-SC

```
LLSC: load-linked store-conditional
PPC, Alpha, MIPS
bool LLSC(addr, val) {
  ret = *addr;
  atomic {
    if(*addr == ret) {
      *addr = val;
      return true;
    }
  return false;
}
```

```
void LLSC_lock(lock) {
  while(1) {
    old = load-linked(lock);
    if(old == 0 && store-cond(lock, 1))
      return;
  }
}
```

- load-linked is a load that is "linked" to a subsequent store-conditional
- Store-conditional only succeeds if value from linked-load is unchanged



#### PrWr/ BusRd) LLSC Lock Action Zone P<sub>2</sub> P<sub>1</sub> State Data State Data lock: lock: P2 P1 lock(lock) { lock(lock) { lock: 0 while(1) { while(1) { old = ll(lock); old = ll(lock); if(old == 0)if(old == 0)if(sc(lock, 1)) if(sc(lock, 1)) return; return; }

#### PrWr/ BusRd) LLSC Lock Action Zone P<sub>2</sub> P<sub>1</sub> State Data State Data lock: <mark>S[L]</mark> 0 lock: P2 P1 lock(lock) { lock(lock) { lock: 0 while(1) { while(1) { old = ll(lock); old = ll(lock); if(old == 0)if(old == 0)if(sc(lock, 1)) if(sc(lock, 1)) return; return; }

### LLSC Lock Action Zone



### LLSC Lock Action Zone



## LLSC Lock Action Zone II



### LLSC Lock Action Zone II



### LLSC Lock Action Zone II




PrWr/ BusRd)



PrWr/ BusRd)



PrWr/ BusRd)

}





int lock\_value = 0; int\* lock = &lock\_value;

int lock\_value = 0; int\* lock = &lock\_value;

Lock::Acquire() { while (test&set(lock) == 1) ; //spin }





int lock\_value = 0; int\* lock = &lock\_value;

Lock::Acquire() { while (test&set(lock) == 1) ; //spin }



(test & set ~ CAS ~ LLSC)

Lock::Release() { \*lock = 0; }

int lock\_value = 0; int\* lock = &lock\_value;

Lock::Acquire() { while (test&set(lock) == 1) ; //spin }



(test & set ~ CAS ~ LLSC)

```
Lock::Release() {
    *lock = 0;
}
```

#### What is the problem with this?

- > A. CPU usage B. Memory usage C. Lock::Acquire() latency
- D. Memory bus usage E. Does not work

Initially, lock already held by some other CPU—A, B busy-waiting



Initially, lock already held by some other CPU—A, B busy-waiting



Initially, lock already held by some other CPU—A, B busy-waiting



Initially, lock already held by some other CPU—A, B busy-waiting



Initially, lock already held by some other CPU—A, B busy-waiting



Initially, lock already held by some other CPU—A, B busy-waiting













### TTS: Reducing busy wait contention

```
Test&Set
                                               Test&Test&Set
                                   Lock::Acquire() {
Lock::Acquire() {
while (test&set(lock) == 1);
                                   while(1) \{
                                     while (*lock == 1); // spin just reading
                                     if (test&set(lock) == 0) break;
 Busy-wait on in-memory copy
                                          Busy-wait on cached copy
Lock::Release() {
                                   Lock::Release() {
  *lock = 0;
                                   *lock = 0;
```

### TTS: Reducing busy wait contention



- What is the problem with this?
  - A. CPU usage B. Memory usage C. Lock::Acquire() latency
  - D. Memory bus usage E. Does not work

























### How can we improve over busy-wait?

Lock::Acquire() {
while(1) {
 while (\*lock == 1) ; // spin just reading
 if (test&set(lock) == 0) break;
}

### Mutex

- Same abstraction as spinlock
- But is a "blocking" primitive
  - Lock available  $\rightarrow$  same behavior
  - Lock held  $\rightarrow$  yield/block
- Many ways to yield
- Simplest case of semaphore

```
void cm3_lock(u8_t* M) {
  u8_t LockedIn = 0;
  do {
   if (__LDREXB(Mutex) == 0) {
     // unlocked: try to obtain lock
     if ( __STREXB(1, Mutex)) { // got lock
       ___CLREX(); // remove __LDREXB() lock
       LockedIn = 1;
     else task_yield(); // give away cpu
   else task_yield(); // give away cpu
} while(!LockedIn);
```

### Mutex

- Same abstraction as spinlock
- But is a "blocking" primitive
  - Lock available  $\rightarrow$  same behavior
  - Lock held  $\rightarrow$  yield/block
- Many ways to yield
- Simplest case of semaphore

```
void cm3_lock(u8_t* M) {
  u8_t LockedIn = 0;
  do {
   if (__LDREXB(Mutex) == 0) {
     // unlocked: try to obtain lock
     if ( __STREXB(1, Mutex)) { // got lock
       ___CLREX(); // remove __LDREXB() lock
       LockedIn = 1;
     else task_yield(); // give away cpu
   else task_yield(); // give away cpu
} while(!LockedIn);
```

• Is it better to use a spinlock or mutex on a uni-processor?
### Mutex

- Same abstraction as spinlock
- But is a "blocking" primitive
  - Lock available  $\rightarrow$  same behavior
  - Lock held  $\rightarrow$  yield/block
- Many ways to yield
- Simplest case of semaphore

```
void cm3_lock(u8_t* M) {
  u8_t LockedIn = 0;
  do {
   if (__LDREXB(Mutex) == 0) {
     // unlocked: try to obtain lock
     if ( __STREXB(1, Mutex)) { // got lock
       ___CLREX(); // remove ___LDREXB() lock
       LockedIn = 1;
     else task_yield(); // give away cpu
   else task_yield(); // give away cpu
} while(!LockedIn);
```

- Is it better to use a spinlock or mutex on a uni-processor?
- Is it better to use a spinlock or mutex on a multi-processor?

### Mutex

- Same abstraction as spinlock
- But is a "blocking" primitive
  - Lock available  $\rightarrow$  same behavior
  - Lock held  $\rightarrow$  yield/block
- Many ways to yield
- Simplest case of semaphore

```
void cm3_lock(u8_t* M) {
  u8_t LockedIn = 0;
  do {
   if (__LDREXB(Mutex) == 0) {
     // unlocked: try to obtain lock
     if ( __STREXB(1, Mutex)) { // got lock
       ___CLREX(); // remove ___LDREXB() lock
       LockedIn = 1;
     else task_yield(); // give away cpu
   else task_yield(); // give away cpu
} while(!LockedIn);
```

- Is it better to use a spinlock or mutex on a uni-processor?
- Is it better to use a spinlock or mutex on a multi-processor?
- How do you choose between spinlock/mutex on a multiprocessor?

### **Priority Inversion**

```
A(prio-0) → enter(I);
B(prio-100) → enter(I); → must wait.
```

Solution?

## **Priority Inversion**

```
A(prio-0) → enter(I);
B(prio-100) → enter(I); → must wait.
```

Solution?

**Priority inheritance:** A runs at B's priority MARS pathfinder failure: <u>http://wiki.csie.ncku.edu.tw/embedded/priority-inversion-on-Mars.pdf</u>

Other ideas?

# Dekker's Algorithm



### Questions?