Foundations:
Synchronization
Execution Abstractions

Chris Rossbach
CS378
Today

• Questions?
• Administrivia
  • Lab 1 due sooner than you’d like
• Foundations
  • Threads/Processes/Fibers
  • Cache coherence (maybe)

• Acknowledgments: some materials in this lecture borrowed from
  • Emmett Witchel (who borrowed them from: Kathryn McKinley, Ron Rockhold, Tom Anderson, John Carter, Mike Dahlin, Jim Kurose, Hank Levy, Harrick Vin, Thomas Narten, and Emery Berger)
  • Andy Tannenbaum
Faux Quiz  (answer any 2, 5 min)

- What is the maximum possible speedup of a 75% parallelizable program on 8 CPUs
- What is super-linear speedup? List two ways in which super-linear speedup can occur.
- What is the difference between strong and weak scaling?
- Define Safety, Liveness, Bounded Waiting, Failure Atomicity
- What is the difference between processes and threads?
- What’s a fiber? When and why might fibers be a better abstraction than threads?
Review: correctness conditions

- Safety
  - Only one thread in the critical region

- Liveness
  - Some thread that enters the entry section eventually enters the critical region
  - Even if other thread takes forever in non-critical region

- Bounded waiting
  - A thread that enters the entry section enters the critical section within some bounded number of operations.
  - If a thread is in entry section, then there is a bound on the number of times that other threads are allowed to enter the critical section before thread i's request is granted

while(1) {
  Entry section
  Critical section
  Exit section
  Non-critical section
}

Mutex, spinlock, etc. are ways to implement

Did we get all the important conditions?
Why is correctness defined in terms of locks?

Theorem: Every property is a combination of a safety property and a liveness property.
-Bowen Alpern & Fred Schneider
Processes and Threads and Fibers...

- Abstractions
- Containers
- State
  - Where is shared state?
  - How is it accessed?
  - Is it mutable?
Programming and Machines: a mental model

```c
struct machine_state{
    uint64 pc;
    uint64 Registers[16];
    uint64 cr[6]; // control registers cr0-cr4 and EFER on AMD
    ...
} machine;
while(1) {
    fetch_instruction(machine.pc);
    decode_instruction(machine.pc);
    execute_instruction(machine.pc);
}
void execute_instruction(i) {
    switch(opcode) {
    case add_rr:
        machine.Registers[i.dst] += machine.Registers[i/src];
        break;
    }
```
Parallel Machines: a mental model

Processes, threads, fibers, events continuations, … are all abstractions for this
Processes

- Multiprogramming of four programs
- Conceptual model of 4 independent, sequential processes
- Only one program active at any instant

**Model**

**Implementation**

<table>
<thead>
<tr>
<th>Process management</th>
<th>Memory management</th>
<th>File management</th>
</tr>
</thead>
<tbody>
<tr>
<td>Registers</td>
<td>Pointer to text segment</td>
<td>Root directory</td>
</tr>
<tr>
<td>Program counter</td>
<td>Pointer to data segment</td>
<td>Working directory</td>
</tr>
<tr>
<td>Program status word</td>
<td>Pointer to stack segment</td>
<td>File descriptors</td>
</tr>
<tr>
<td>Stack pointer</td>
<td></td>
<td>User ID</td>
</tr>
<tr>
<td>Process state</td>
<td></td>
<td>Group ID</td>
</tr>
<tr>
<td>Priority</td>
<td></td>
<td></td>
</tr>
<tr>
<td>Scheduling parameters</td>
<td></td>
<td></td>
</tr>
<tr>
<td>Process ID</td>
<td></td>
<td></td>
</tr>
<tr>
<td>Parent process</td>
<td></td>
<td></td>
</tr>
<tr>
<td>Process group</td>
<td></td>
<td></td>
</tr>
<tr>
<td>Signals</td>
<td></td>
<td></td>
</tr>
<tr>
<td>Time when process started</td>
<td></td>
<td></td>
</tr>
<tr>
<td>CPU time used</td>
<td></td>
<td></td>
</tr>
<tr>
<td>Children's CPU time</td>
<td></td>
<td></td>
</tr>
<tr>
<td>Time of next alarm</td>
<td></td>
<td></td>
</tr>
</tbody>
</table>
Process Address Space

access requires kernel mode

Encapsulates state for a context
State can be shared through memory!

Q: How to share data across processes?
Anyone heard of KPTI?
Abstractions for Concurrency

(a) Three processes each with one thread

(b) One process with three threads

When might (a) be better than (b)? Vice versa?
Could you do lab 1 with processes instead of threads?
Threads simplify sharing and reduce context overheads
The Thread Model

**Per process items**
- Address space
- Global variables
- Open files
- Child processes
- Pending alarms
- Signals and signal handlers
- Accounting information

**Per thread items**
- Program counter
- Registers
- Stack
- State

- Items shared by all threads in a process
- Items private to each thread

**Decouples memory and control abstractions**

**What are the advantages of that?**
Using threads

Ex. How might we use threads in a word processor?
Where to Implement Threads:

A user-level threads package

A threads package managed by the kernel

What are some tradeoffs between user/kernel support for threads?
Execution Context Management

“Task” == “Flow of Control”, but with less typing
“Stack” == Task State

**Task Management**
- Preemptive
  - Interleave on uniprocessor
  - Overlap on multiprocessor
- Serial
  - One at a time, no conflict
- Cooperative
  - Yields at well-defined points
  - E.g. wait for long-running I/O

**Stack Management**
- Manual
  - Inherent in Cooperative
  - Changing at quiescent points
- Automatic
  - Inherent in pre-emptive
  - Downside: Hidden concurrency assumptions

These dimensions can be orthogonal.
Fibers: the Sweet Spot?

• Cooperative tasks
  • most desirable when reasoning about concurrency
  • usually associated with event-driven programming

• Automatic stack management
  • most desirable when reading/maintaining code
  • Usually associated with threaded (or serial) programming

Fibers: cooperative threading with automatic stack management
Threads vs Fibers

• Like threads, *just an abstraction* for flow of control
• *Lighter weight* than threads
  • In Windows, just a stack, subset of arch. registers, non-preemptive
  • *Not* just threads without exception support
  • stack management/impl has interplay with exceptions
  • Can be completely exception safe

• *Takeaway*: diversity of abstractions/containers for execution flows
x86_64 Architectural Registers

- Register map diagram courtesy of: By Immae - Own work, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=32745525
• Register map diagram courtesy of: Own work, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=32745525

**Linux x86_64 context switch excerpt**

The x86_64 architecture provides 20 general 64-bit registers together with 16 32-bit SSE registers, all mapping to 8 legacy x86-87 floating point registers.

**Complete fiber context switch on Unix and Windows**

### 8-bit registers
- CR0
- CR4
- CR1
- CR5
- CR2
- CR6
- CR3
- CR7

### 16-bit registers
- MSW
- CR8
- CR9
- CR10
- CR11
- CR12
- DR6
- DR13
- DR7
- DR14
- DR8
- CR15
- MXCSR
- DR4
- DR10
- DR12
- DR14
- DR5
- DR11
- DR13
- DR15
# x86_64 Registers and Threads

<table>
<thead>
<tr>
<th>ZMM0</th>
<th>YMM0</th>
<th>XMM0</th>
<th>ZMM1</th>
<th>YMM1</th>
<th>XMM1</th>
<th>ST(0)</th>
<th>MM0</th>
<th>ST(1)</th>
<th>MM1</th>
<th>RXA</th>
<th>R8D</th>
<th>R10E</th>
<th>R12E</th>
<th>R14E</th>
<th>R15E</th>
</tr>
</thead>
<tbody>
<tr>
<td>ZMM2</td>
<td>YMM2</td>
<td>XMM2</td>
<td>ZMM3</td>
<td>YMM3</td>
<td>XMM3</td>
<td>ST(2)</td>
<td>MM2</td>
<td>ST(3)</td>
<td>MM3</td>
<td>R8W</td>
<td>R9</td>
<td>R10</td>
<td>R12</td>
<td>R13</td>
<td>R15</td>
</tr>
<tr>
<td>ZMM4</td>
<td>YMM4</td>
<td>XMM4</td>
<td>ZMM5</td>
<td>YMM5</td>
<td>XMM5</td>
<td>ST(4)</td>
<td>MM4</td>
<td>ST(5)</td>
<td>MM5</td>
<td>R8R</td>
<td>R9C</td>
<td>R10C</td>
<td>R12C</td>
<td>R13C</td>
<td>R14C</td>
</tr>
<tr>
<td>ZMM6</td>
<td>YMM6</td>
<td>XMM6</td>
<td>ZMM7</td>
<td>YMM7</td>
<td>XMM7</td>
<td>ST(6)</td>
<td>MM6</td>
<td>ST(7)</td>
<td>MM7</td>
<td>R8X</td>
<td>R9</td>
<td>R10X</td>
<td>R12X</td>
<td>R13X</td>
<td>R14X</td>
</tr>
<tr>
<td>ZMM8</td>
<td>YMM8</td>
<td>XMM8</td>
<td>ZMM9</td>
<td>YMM9</td>
<td>XMM9</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>ZMM10</td>
<td>YMM10</td>
<td>XMM10</td>
<td>ZMM11</td>
<td>YMM11</td>
<td>XMM11</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>ZMM12</td>
<td>YMM12</td>
<td>XMM12</td>
<td>ZMM13</td>
<td>YMM13</td>
<td>XMM13</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>ZMM14</td>
<td>YMM14</td>
<td>XMM14</td>
<td>ZMM15</td>
<td>YMM15</td>
<td>XMM15</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>ZMM16</td>
<td>ZMM17</td>
<td>ZMM18</td>
<td>ZMM19</td>
<td>ZMM20</td>
<td>ZMM21</td>
<td>ZMM22</td>
<td>ZMM23</td>
<td>ZMM24</td>
<td>ZMM25</td>
<td>ZMM26</td>
<td>ZMM27</td>
<td>ZMM28</td>
<td>ZMM29</td>
<td>ZMM30</td>
<td>ZMM31</td>
</tr>
</tbody>
</table>

- **Register map diagram courtesy of:** By Immae - Own work, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=32745525
x86_64 Registers and Fibers

The takeaway:
• Many abstractions for flows of control
• Different tradeoffs in overhead, flexibility
• Matters for concurrency: exercised heavily

Register map diagram courtesy of: By Immae - Own work, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=32745525
Pthreads

- POSIX standard thread model,
- Specifies the API and call semantics.
- Popular – most thread libraries are Pthreads-compatible
Preliminaries

• Include `pthread.h` in the main file

• Compile program with `-lpthread`
  • `gcc -o test test.c -lpthread`
  • may not report compilation errors otherwise but calls will fail

• Good idea to check return values on common functions
Thread creation

• Types: `pthread_t` – type of a thread

• Some calls:

```c
int pthread_create(pthread_t *thread,
                  const pthread_attr_t *attr,
                  void * (*start_routine)(void *),
                  void *arg);
```

```c
int pthread_join(pthread_t thread, void **status);
```

```c
int pthread_detach();
```

```c
void pthread_exit();
```

• No explicit parent/child model, except main thread holds process info
• **Call** `pthread_exit` in main, don’t just fall through;
• **When do you need** `pthread_join` ?
  • `status` = exit value returned by joinable thread
• Detached threads are those which cannot be joined (can also set this at creation)
Creating multiple threads

```c
#include <stdio.h>
#include <pthread.h>
#define NUM_THREADS 4

void *hello (void *arg) {
    printf("Hello Thread\n");
}

int main() {
    pthread_t tid[NUM_THREADS];
    for (int i = 0; i < NUM_THREADS; i++)
        pthread_create(&tid[i], NULL, hello, NULL);

    for (int i = 0; i < NUM_THREADS; i++)
        pthread_join(tid[i], NULL);
}```
Can you find the bug here?

What is printed for myNum?

```c
void *threadFunc(void *pArg) {
    int* p = (int*)pArg;
    int myNum = *p;
    printf( "Thread number %d\n", myNum);
}
...
// from main():
for (int i = 0; i < numThreads; i++) {
    pthread_create(&tid[i], NULL, threadFunc, &i);
}
```
Pthread Mutexes

- **Type:** `pthread_mutex_t`

```c
int pthread_mutex_init(pthread_mutex_t *mutex,
                        const pthread_mutexattr_t *attr);
int pthread_mutex_destroy(pthread_mutex_t *mutex);
int pthread_mutex_lock(pthread_mutex_t *mutex);
int pthread_mutex_unlock(pthread_mutex_t *mutex);
int pthread_mutex_trylock(pthread_mutex_t *mutex);
```

- **Attributes:** for shared mutexes/condition vars among processes, for priority inheritance, etc.
  - use defaults

- **Important:** Mutex scope must be visible to all threads!
Pthread Spinlock

- **Type:** `pthread_spinlock_t`

```c
int pthread_spinlock_init(pthread_spinlock_t *lock);
int pthread_spinlock_destroy(pthread_spinlock_t *lock);
int pthread_spin_lock(pthread_spinlock_t *lock);
int pthread_spin_unlock(pthread_spinlock_t *lock);
int pthread_spin_trylock(pthread_spinlock_t *lock);
```

Wait...what's the difference?

```c
int pthread_mutex_init(pthread_mutex_t *mutex,...);
int pthread_mutex_destroy(pthread_mutex_t *mutex);
int pthread_mutex_lock(pthread_mutex_t *mutex);
int pthread_mutex_unlock(pthread_mutex_t *mutex);
int pthread_mutex_trylock(pthread_mutex_t *mutex);
```
Review: correctness conditions

- Safety
  - Only one thread in the critical region

- Liveness
  - Some thread that enters the entry section eventually enters the critical region
  - Even if other thread takes forever in non-critical region

- Bounded waiting
  - A thread that enters the entry section enters the critical section within some bounded number of operations.
  - If a thread is in entry section, then there is a bound on the number of times that other threads are allowed to enter the critical section before thread i’s request is granted

Theorem: Every property is a combination of a safety property and a liveness property.

-Bowen Alpern & Fred Schneider

Did we get all the important conditions?

Why is correctness defined in terms of locks?
Implementing Locks

```cpp
int lock_value = 0;
int* lock = &lock_value;

Lock::Acquire() {
    while (*lock == 1)
        ; //spin
    *lock = 1;
}

Lock::Release() {
    *lock = 0;
}
```

What are the problem(s) with this?
- A. CPU usage
- B. Memory usage
- C. Lock::Acquire() latency
- D. Memory bus usage
- E. Does not work

Completely and utterly broken. How can we fix it?
HW Support for Read-Modify-Write (RMW)

**Preview of Techniques:**
- Bus locking
- Single Instruction ISA extensions
  - Test&Set
  - CAS: Compare & swap
  - Exchange, locked increment, locked decrement (x86)
- Multi-instruction ISA extensions:
  - LLSC: (PowerPC, Alpha, MIPS)
  - Transactional Memory (x86, PowerPC)

```c
bool rmw(addr, value) {
    atomic {
        tmp = *addr;
        newval = modify(tmp);
        *addr = newval;
    }
}
```

IDEA: hardware implements something like:

Why is that hard? How can we do it?

More on this later…
Implementing Locks with Test&set

```cpp
int lock_value = 0;
int* lock = &lock_value;

void Lock::Acquire() {
    while (test&set(lock) == 1) ; //spin
}

void Lock::Release() {
    *lock = 0;
}
```

(test & set  ~= CAS  ~= LLSC)

TST: Test&set
- Reads a value from memory
- Write “1” back to memory location

What are the problem(s) with this?
- A. CPU usage
- B. Memory usage
- C. Lock::Acquire() latency
- D. Memory bus usage
- E. Does not work

More on this later...
Physics | Concurrency

\[ F = ma \sim coherence \]
Multiprocessor Cache Coherence

- P1: read X
- P2: read X
- P2: X++
- P3: read X
Multiprocessor Cache Coherence

Each cache line has a state (M, E, S, I)
- Processors “snoop” bus to maintain states
- Initially $\rightarrow$ ‘I’ $\rightarrow$ Invalid
- Read one $\rightarrow$ ‘E’ $\rightarrow$ exclusive
- Reads $\rightarrow$ ‘S’ $\rightarrow$ multiple copies possible
- Write $\rightarrow$ ‘M’ $\rightarrow$ single copy $\rightarrow$ lots of cache coherence traffic
Cache Coherence: single-thread

P1

// (straw-person lock impl)
// Initially, lock == 0 (unheld)
lock() {
    try: load lock, R0
    test R0
    bnz try
    store lock, 1
}
Cache Coherence Action Zone

// (straw-person lock impl)  
// Initially, lock == 0 (unheld)
lock() {
    try:  load lock, R0
         test R0
         bnz try
    store lock, 1
}

// (straw-person lock impl)  
// Initially, lock == 0 (unheld)
lock() {
    try:  load lock, R0
         test R0
         bnz try
    store lock, 1
}
Cache Coherence Action Zone II

NOT SAFE!

// (straw-person lock impl)
// Initially, lock == 0 (unheld)
lock() {
    try:  load lock, R0
    test R0
    bnz try
    store lock, 1
}

// (straw-person lock impl)
// Initially, lock == 0 (unheld)
lock() {
    try:  load lock, R0
    test R0
    bnz try
    store lock, 1
}
Read-Modify-Write (RMW)

- Implementing locks requires read-modify-write operations
- Required effect is:
  - An atomic and isolated action
    1. read memory location **AND**
    2. write a new value to the location
  - RMW is *very tricky* in multi-processors
  - Cache coherence alone doesn’t solve it

```c
// (straw-person lock impl)
// Initially, lock == 0 (unheld)
llock() {
    try:  load lock, R0
    test R0
    bnz try
    store lock, 1
}
```
Essence of HW-supported RMW

// (straw-person lock impl)
// Initially, lock == 0 (unheld)
lock() {
try:
load lock, R0
test R0
bnz try
store lock, 1
}

Make this into a single (atomic hardware instruction)
## HW Support for Read-Modify-Write (RMW)

<table>
<thead>
<tr>
<th>Test &amp; Set</th>
<th>CAS</th>
<th>Exchange, locked increment/decrement,</th>
<th>LLSC: load-linked store-conditional</th>
</tr>
</thead>
<tbody>
<tr>
<td>Most architectures</td>
<td>Many architectures</td>
<td>x86</td>
<td>PPC, Alpha, MIPS</td>
</tr>
</tbody>
</table>

```c
int TST(addr) {  
    atomic {  
        ret = *addr;  
        if(!*addr)  
            *addr = 1;  
        return ret;  
    }
}

bool cas(addr, old, new) {  
    atomic {  
        if(*addr == old) {  
            *addr = new;  
            return true;  
        }  
        return false;  
    }
}

int XCHG(addr, val) {  
    atomic {  
        ret = *addr;  
        *addr = val;  
        return ret;  
    }
}

bool LLSC(addr, val) {  
    ret = *addr;  
    atomic {  
        if(*addr == ret) {  
            *addr = val;  
            return true;  
        }  
        return false;  
    }
}

void CAS_lock(lock) {  
    while(CAS(&lock, 0, 1) != true);  
}
```
HW Support for RMW: LL-SC

**Load-Linked Store-Conditional (LLSC)**

### Code Snippet

```c
bool LLSC(addr, val) {
    ret = *addr;
    atomic {
        if(*addr == ret) {
            *addr = val;
            return true;
        }
    }
    return false;
}

void LLSC_lock(lock) {
    while(1) {
        old = load-linked(lock);
        if(old == 0 && store-cond(lock, 1))
            return;
    }
}
```

- load-linked is a load that is “linked” to a subsequent store-conditional
- Store-conditional only succeeds if value from linked-load is unchanged
LLSC Lock Action Zone

P1

lock: 0

lock(lock) {
    while(1) {
        old = ll(lock);
        if(old == 0)
            if(sc(lock, 1))
                return;
    }
}

P2

lock: 1

lock(lock) {
    while(1) {
        old = ll(lock);
        if(old == 0)
            if(sc(lock, 1))
                return;
    }
}
LLSC Lock Action Zone II

P1
lock: [M] 0
lock: 0

P2
lock: SIL 0

lock: 0

P1
lock(lock) {
  while(1) {
    old = ll(lock);
    if(old == 0)
      if(sc(lock, 1))
        return;
  }
}

P2
lock(lock) {
  while(1) {
    old = ll(lock);
    if(old == 0)
      if(sc(lock, 1))
        if(sc(lock, 1))
          return;
  }
}
Implementing Locks with Test&set

```cpp
int lock_value = 0;
int* lock = &lock_value;

Lock::Acquire() {
    while (test&set(lock) == 1) //spin
}

Lock::Release() {
    *lock = 0;
}
```

What is the problem with this?

- A. CPU usage
- B. Memory usage
- C. Lock::Acquire() latency
- D. Memory bus usage
- E. Does not work

(test & set ~ CAS ~ LLSC)
Test & Set with Memory Hierarchies

Initially, lock already held by some other CPU—A, B busy-waiting

What happens to lock variable’s cache line when different cpu’s contend?

Load can stall

- With bus-locking, lock prefix blocks *everyone*
- With CAS, LL-SC, cache line cache line “ping pongs” amongst contenders
TTS: Reducing busy wait contention

Test&Set

Lock::Acquire() {
    while (test&set(lock) == 1);
}

Lock::Release() {
    *lock = 0;
}

Busy-wait on in-memory copy

Test&Test&Set

Lock::Acquire() {
    while(1) {
        while (*lock == 1); // spin just reading
        if (test&set(lock) == 0) break;
    }
}

Lock::Release() {
    *lock = 0;
}

Busy-wait on cached copy

• What is the problem with this?
  • A. CPU usage  B. Memory usage  C. Lock::Acquire() latency
  • D. Memory bus usage  E. Does not work
Test & Test & Set with Memory Hierarchies

What happens to lock variable’s cache line when different cpu’s contend for the same lock?
Test & Test & Set with Memory Hierarchies

What happens to lock variable’s cache line when different cpu’s contend for the same lock?

Wait...why all this spinning?
How can we improve over busy-wait?

```
Lock::Acquire() {
    while(1) {
        while (*lock == 1) ; // spin just reading
        if (test&set(lock) == 0) break;
    }
}
```
Mutex

• Same abstraction as spinlock
• But is a “blocking” primitive
  • Lock available → same behavior
  • Lock held → yield/block
• Many ways to yield
• Simplest case of semaphore

```c
void cm3_lock(u8_t* M) {
    u8_t LockedIn = 0;
    do {
        if (__LDREXB(Mutex) == 0) {
            // unlocked: try to obtain lock
            if (__STREXB(1, Mutex)) { // got lock
                __CLREX(); // remove __LDREXB() lock
                LockedIn = 1;
            }
            else task_yield(); // give away cpu
        } else task_yield(); // give away cpu
    } while(!LockedIn);
}
```

• Is it better to use a spinlock or mutex on a uni-processor?
• Is it better to use a spinlock or mutex on a multi-processor?
• How do you choose between spinlock/mutex on a multi-processor?
Priority Inversion

A(prio-0) \rightarrow \text{enter}(l);
B(prio-100) \rightarrow \text{enter}(l); \rightarrow \text{must wait.}

Solution?

**Priority inheritance:** A runs at B’s priority
MARS pathfinder failure:

Other ideas?
Dekker’s Algorithm

variables
    wants_to_enter : array of 2 booleans
    turn : integer

wants_to_enter[0] = false
wants_to_enter[1] = false
turn = 0  // or 1

p0:
    wants_to_enter[0] = true
    while wants_to_enter[1] {
        if turn = 0 {
            wants_to_enter[0] = false
            while turn = 0 {
                // busy wait
            }
            wants_to_enter[0] = true
        }
    }
    // critical section
    ...
    turn = 1
    wants_to_enter[0] = false
    // remainder section

p1:
    wants_to_enter[1] = true
    while wants_to_enter[0] {
        if turn = 1 {
            wants_to_enter[1] = false
            while turn = 1 {
                // busy wait
            }
            wants_to_enter[1] = true
        }
    }
    // critical section
    ...
    turn = 0
    wants_to_enter[1] = false
    // remainder section

initially:  c1,c2,turn = 1,1,1

{c1 := 0
  c2 := 0
  turn := 1
  c1 := 1
  turn := 2
  c2 := 1
}

process 1

    Th. J. Dekker’s Solution

process 2

critical section 1; turn := 2; c1 := 1; noncritical 1

critical section 2; turn := 1; c2 := 1; noncritical 2
Questions?