# A Unifying Theory of Correct Concurrent Executions* 

Banu Özden<br>Department of Electrical and Computer Engineering<br>The University of Texas at Austin<br>Austin, Texas 78712-1084

Avi Silberschatz<br>Department of Computer Sciences<br>The University of Texas at Austin<br>Austin, Texas 78712-1188


#### Abstract

An ideal system is one that performs program operations in the order specified by the program and executes atomic program segments exclusively. Although this system model simplifies the task of reasoning about both sequential and concurrent programs, its straightforward implementation yields poor performance. To enhance performance, concurrency and pipelining techniques can be used, which may result in data accesses that are performed in an order which is different from the order specified by the program, which may result in incorrect executions. An execution is correct if its result is equivalent to the result that could have been obtained had the execution taken place on the ideal system. In this paper, we develop a unified general theory of correct executions where the access orders differ from the access order on the ideal system. Our unifying theory is applicable to a variety of programming paradigms, application domains, and architectures. It provides a verification tool to test the correctness of an execution, and allows us to devise more efficient protocols for various systems.


## Index Terms

Access order, concurrency, concurrent execution, concurrent programming, correctness, databases, distributed systems, multiprocessors, pipelining, sequential consistency, serializability, synchronization.

## 1 Introduction

In order to aid the programmers with the task of reasoning about the correctness of their programs, an execution model is usually provided, which is a description of the execution order of the various operations of a program. Examples of execution models are sequentiality for sequential systems [1], sequential consistency for multiprocessors [1], and serializability for database systems [2].

Although the availability of an execution model simplifies the reasoning about the programs, additional synchronization constructs must be available so that programmers can explicitly express a specific order on the execution of the operations of their programs. Examples of synchronization constructs are semaphores, critical sections, monitors, barrier and condition synchronization primitives [3, 4, 5].

[^0]There is a basic simple protocol to implement each of these execution models and a known effective implementation of each of these synchronization constructs. For instance, sequentiality can be ensured by executing the operations of a sequential program in the program order, sequential consistency can be ensured by executing the operations of each program of a concurrent program in the program order, and serializability can be ensured by executing transactions in a serial order.

Since these protocols and implementations, in general, yield poor performance, a significant amount of research has been done to devise methods to obtain better performance. The two most common techniques for achieving this are pipelining and concurrency $[1,2,6,7,8,9,10,11]$. Pipelining is a method for overlapping the execution of multiple operations of a process, whereas concurrency is a method for overlapping the execution of multiple operations of different processes (a process is an execution of a sequential program, namely, an execution of a sequential operation stream.) The use of concurrency and pipelining must be controlled, since both may change the order in which data accesses are performed, and therefore may yield incorrect execution.

Pipelining allows an operation to be issued, before the previous operations in the program order are issued or performed. In order to mask latency of interconnection networks, memory accesses are issued before the previous accesses are performed in some shared memory multiprocessors, and messages are sent before the previous messages are delivered in some distributed memory multiprocessors. If the interconnection network is asynchronous, then the order, in which memory accesses are executed and messages delivered, may differ from the order specified by the program. Similarly in a pipelined processor, in order to increase the throughput, an instruction can be issued before completion of a previous instruction, which may cause memory accesses to be performed in an unintended order.

Concurrency allows several processes to execute simultaneously. Typically, programs require more than one data item to be accessed atomically (without interleaving with other's data accesses). Examples are programming languages in which sequence of statements can be specified as atomic, or databases where each transaction should be atomic. Since executing the atomic sections in isolation may degrade the performance, concurrent executions are allowed, which can result in incorrect interleavings of data accesses.

The problem that the execution order of data accesses of a program can be different from the intended order and, therefore, the execution may be incorrect exists in numerous programming paradigms, application domains, and architectures. Examples are sequential programming, concurrent programming based on shared data or message-passing, parallel programming, centralized and distributed databases, single processor systems, and shared and distributed multiprocessor systems. Although the nature of the problem is the same, there has been no research present a unified solution. It is the aim of this paper to develop a unifying theory for correct execution.

A system that performs operations in the program order and executes atomic sections exclusively is referred to as an ideal system. We assume that each program is correct in the sense that if it were executed on an ideal system, then its result is the desired one. It is the responsibility of the programmers to ensure that their programs are indeed correct. We refer to an execution on the ideal system as the specification of the correct
execution, or the correctness criterion. We refer to an execution whose result is equivalent to the result of the execution on the ideal system, as a correct execution.

Given a specification of a correct execution, our goal is to define the class of correct executions whose access order is less restrictive than the the access order of the corresponding execution on the ideal system. We develop a general theory of correct execution that is applicable to any correctness criterion that can be expressed as follows. An execution on an ideal system is a set of sequential processes, each of which is a sequence of atomic actions. An atomic action is a sequence of indivisible data accesses. The order among the atomic actions of different processes can be expressed with a partial order. The processes can run in parallel and can share data. This correctness criterion is sufficiently general to encampus sequentiality, sequential consistency and serializability as special cases.

Our unifying theory provides a verification tool to test the correctness of an execution. It will have impact on understanding the access ordering problem, and will allow us to devise more efficient protocols for various systems. Furthermore, the unified theory will allow one to extend results developed in one type of system to other systems.

The remainder of this paper is organized as follows. In Section 2, we present examples for access ordering problem. In Section 3, we introduce the system model. In Section 4, we discuss the differences between correctness, sequential consistency and serializability. In Section 5, we introduce various classes of correctness. In Section 6, we present the concepts of hierarchical graphs and hierarchical polygraphs to reduce the complexity of testing algorithms, whereas in Section 7, we develop testing algorithms for different classes of correctness. We present our conclusions in Section 8, and prove the theorems in the Appendix.

## 2 Examples

In order to motivate our work, we will give examples from several programming and application paradigms and different architectures where concurrency and pipelining change the execution order with respect to the intended order.

Consider a single pipelined processor system, which allows the issuing of memory accesses of the next operation, before the execution of previous instructions are completed. Suppose that sequential program $K 1$ in Figure 1 is executed on such a processor. In this case, it is possible that operand $a$ of the second instruction is loaded before the value of $a$ is calculated and stored by the first instruction. Hence, the execution will be incorrect. On the other hand, if the operands of the third instruction are loaded before the previous instruction is completed, the result will be correct.

Consider Peterson's solution to two-process critical section problem as shown in Figure 2. The program is written with the assumption that the system is sequentially consistent. Suppose this program is executed on a shared memory multiprocessor. If the basic load and store operations are indivisible, and the system does not pipeline the memory accesses, then the execution of this program will yield correct result, namely, at most one process can be in the critical section. Now, suppose that the system allows pipelining of loads and stores.

$$
\begin{aligned}
& a:=b / c ; \\
& d:=d+a ; \\
& e:=e-f ;
\end{aligned}
$$

Figure 1: Sequential program $K 1$.

| $P 0$ |  | $P 1$ |
| :--- | :--- | :--- |
| shared $F 0, F 1:$ boelean; <br> shared turn, $x, y$ : integer; |  | shared $F 0, F 1:$ boelean; |
| shared tur $n, x, y$ : integer; |  |  |
| $F 0:=$ true; |  | $F 1:=$ true; |
| turn $:=1 ;$ |  | turn $:=0 ;$ |
| while $(F 1$ and turn $=1)$ do skip; |  | while $(F 0$ and turn $=0)$ do skip; |
| Critical Section |  | Critical Section |
| $F 0:=$ false; | $F 1:=$ false; |  |

Figure 2: K2: Peterson's solution to two-process critical section problem.

Then, the following order of events is possible. Suppose that initially $F 0=F 1=$ false. Process $P_{0}$ issues the requests to store the value true in $F 0$ and the value 1 in turn. Following this, it issues the requests to load $F 1$ and turn, and then enters the critical section. Process $P_{1}$ issues the requests to store the value true in $F 1$ and the value 0 in turn. Following this, it issues the requests to load $F 0$ and turn. The request from process $P_{0}$ to store true into $F 0$ is still not performed, and this load request of process $P_{1}$ returns false as the value of $F 0$, and process $P_{1}$ enters the critical section. Hence, the execution is incorrect.

Consider a client-server system, in which each server manages a set of data, and clients send read and write requests to the appropriate server to access and update data. Suppose that the client programs in Figure 3 are written with the assumption that the system is sequentially consistent. Servers $S 0$ and $S 1$ manage objects object0 and object1 respectively. Each client reads two objects from two different servers, caches the objects into local buffers, increments each word of the objects, and updates the copies of the objects in the servers. Sequential consistency can be ensured by waiting for an acknowledgement message from the server to which a request is sent, before another request is issued. Since such a protocol yields poor performance, the system may choose to pipeline the requests. In this case, the following execution is possible. Suppose that initially all the words of both objects are zero. After $C 0$ sends the update request to $S 0, C 0$ sends read and then the update requests to $S 1$. $C 1$ sends a read request to $S 1$, which arrives at $S 1$ after the write request from $C 0$. Therefore, $C 1$ reads the value written by $C 0$. After sending update message to $S 1, C 1$ sends read request to $S 0$. This request arrives at $S 0$ before the update request by $C 0$. Hence, $C 1$ reads the initial value of object0. This execution yields an incorrect final state in which all the words of object0 are one, and all the words of object 1 are five, whereas a correct execution should result in the state in which all the words of object0 and

| C0 | C1 |
| :---: | :---: |
| shared object 0 , object 1 ; local buffer, $i, n$; | shared object 0 , object 1 ; local buffer, $i, n$; |
| $\begin{aligned} & \text { read }(\text { object } 0, \text { buf fer, } n) ; \\ & \text { for } i=0 \text { to } i<n \text { do } \\ & \text { buffer }[i]:=\text { buffer }[i]+1 ; \\ & \text { write }(\text { object } 0, \text { buffer, } n) ; \\ & \text { read }(\text { object } 1, \text { buffer, } n) ; \\ & \text { for } i=0 \text { to } i<n \text { do } \\ & \text { buffer }[i]:=\text { buffer }[i]+2 \text {; } \\ & \text { write }(\text { object } 1, \text { buffer, } n) \text {; } \end{aligned}$ | $\begin{aligned} & \text { read }(\text { object } 1, \text { buf fer, } n) ; \\ & \text { for } i=0 \text { to } i<n \text { do } \\ & \quad \text { buffer }[i]:=\text { buffer }[i]+3 ; \\ & \text { write }(\text { object } 1, \text { buffer, } n) ; \\ & \text { read } \text {;bject } 0, \text { buffer, } n) ; \\ & \text { for } i=0 \text { to } i<n \text { do } \\ & \text { buffer }[i]:=\text { buffer }[i]+4 ; \\ & \text { write }(\text { object } 0, \text { buffer, } n) ; \end{aligned}$ |

Figure 3: $K 3$ : Client programs $C 0$ and $C 1$.

$$
\begin{array}{ccc}
\frac{P 1}{x:=x+10 ;} & & P 2 \\
y:=y+10 ; & & y:=y+100 ; \\
y: 100 ;
\end{array}
$$

Figure 4: Concurrent program $K 4$.
object1 are five.
Consider concurrent program $K 4$ depicted in Figure 4, which is written with the assumption that increment statements are atomic. If initially $x=0$, then a correct execution must yield $x=110$, which can be ensured by executing each statement atomically. However, such a system yield poor performance. To improve performance, the increment statement might be implemented as a sequence of three indivisible operations: (i) load a register with the value of $x$; (ii) add 10 or 100 to it; (iii) store the result in $x$. Thus, in the concurrent program above, the final value of $x$ might be 10,110 , or 100 . Concurrent execution of $P 1$ and $P 2$ must be synchronized to enforce restrictions on possible interleavings.

Consider a database system that is implemented on a distributed system with the client-server model. The transaction manager is the server and the transactions are clients that send read and write requests to the server. Transactions are written with the assumption that the system will ensure serializability. The system uses a concurrency control protocol which orders transactions, and allows a transaction to issue its operations, only if all the previous transactions in the order complete. If the system does not allow pipelining of data accesses, then this protocol ensures serial executions. However, if the system allows pipelining and interprocessor communication is asynchronous, this protocol may not ensure serializability. Suppose the protocol ordered transactions in Figure 5 such that $T 0$ is to be executed before $T 1$. Let us represent the chronological order in which instructions are executed in the system with a schedule. Figure 6 depicts the schedule generated by this protocol when the data accesses are not pipelined. The schedule is serial. Now suppose that the system

| T0 | $T 1$ |
| :---: | :---: |
| shared $A, B$; local temp; | shared $A, B$; <br> local temp 0 , temp 1 ; |
| ```read(A,temp) temp := temp - 50; write( A, temp); read(B,temp); temp := temp + 50; write( B,temp)``` | ```read(A, temp0) temp1:= temp0*0.1; temp0 = temp0 - temp 1; write(A,temp1); read( }B,\mathrm{ temp0); temp0 := temp0 + temp1; write( }B,\mathrm{ temp0);``` |

Figure 5: K5: Two transactions $T 0$ and $T 1$.
allows pipelining. The following order of events is possible. After $T 0$ issues all its read and write accesses, it commits. T1 issues read ( $A$, temp0). This request arrives the server before write $(A, t e m p)$ of $T 0$. T1 reads the initial value of $A$. $T 1$ issues $\operatorname{read}(B$, temp 0$)$ which arrives the server after write $(B$, temp $)$ of $T 0$. Hence, $T 1$ reads the value of $B$ written by $T 0$. The execution is not serializable.

The protocol above ensures serializability, if there is no pipelining. However, such a protocol decreases performance unnecessarily. Consider the execution in Figure 7. Although transactions are executed concurrently, the result is equal as if $T 1$ is executed after $T 0$. Now consider the execution in Figure 8. Transactions are executed concurrently, but the result is not equal to any serial execution of $T 0$ and $T 1$. Hence, any interleaving of operations may not yield correct result, execution of $T 0$ and $T 1$ must be synchronized to enforce restrictions on possible interleavings.

Consider parallel program $K 6$ in Figure 9 in which barrier synchronization is used. The end sync construct specifies a barrier, which means that a process cannot execute the statements following the barrier before all other processes reach the barrier. The forall construct specifies that processes can execute the loop concurrently, and each iterate of the loop is executed by another process. Suppose that $N=4$ and processes $P_{1}, P_{2}, P_{3}$ and $P_{4}$ execute the iterations $1,2,3$ and 4 for both forall loops, respectively. Program $K 7$ in Figure 10 illustrates the statements that each process will execute in this case. Note that there are execution orders of operations that are different than the order specified by the program, but yield correct execution. For example, the execution will be correct if $P_{1}$ only waits for $P_{2}$, before issuing load $a[2,1]$ after the barrier.

Consider concurrent program $K 8$ in Figure 11, which consists of a producer program and consumer program. The program uses semaphores to specify a specific execution order, namely, initially both buffers are empty and producer writes into buffer 0 and buffer 1 , and the consumer can read the buffers only after the producer writes into the buffers, and the producer can write another item into the buffers only after the consumer reads the buffers. Although this implementation is correct, it may yield poor performance. Suppose this program

| $T 0$ | $T 1$ |
| :---: | :---: |
| ```read(A,temp) temp := temp - 50; write( A, temp); read( }B,\mathrm{ temp); temp:= temp + 50; write( }B,\mathrm{ temp)``` | ```read(A,temp0) temp1:= temp0*0.1; temp0 = temp0 - temp 1; write( B, temp1); write( }B\mathrm{ , temp1); read(B,temp0); temp0:= temp0 + temp1; write( B, temp0);``` |

Figure 6: A serial schedule of $T 0$ and $T 1$.

| T0 | $T 1$ |
| :---: | :---: |
| $\begin{aligned} & \operatorname{read}(A, \text { tem } p) \\ & \text { temp }:=\text { tem } p-50 ; \\ & \text { write }(A, \text { tem } p) \end{aligned}$ | ```read(A,temp0) temp1:= temp0*0.1; temp0 = temp0 - temp 1; write(B,temp1);``` |
| $\begin{aligned} & \operatorname{read}(B, \text { temp }) \\ & \text { temp }:=\text { temp }+50 ; \\ & \text { write }(B, \text { temp }) \end{aligned}$ | $\begin{aligned} & \operatorname{read}(B, \text { tem } 0) \\ & \text { temp } 0:=\text { tem } 0+\text { temp } 1 \\ & \text { write }(B, \text { tem } 0) \end{aligned}$ |

Figure 7: A concurrent serializable schedule of $T 0$ and $T 1$.

| $T 0$ | $T 1$ |
| :---: | :---: |
| $\begin{aligned} & \operatorname{read}(A, \text { tem } p) \\ & \text { temp }:=\text { tem } p-50 ; \end{aligned}$ | ```read(A,temp0) temp1:= temp0*0.1; temp0 = temp0 - temp 1; write( B, temp1); read( B,temp0);``` |
| ```write( A,temp); read(B,temp); temp := temp + 50; write( }B,\mathrm{ temp)``` | $\begin{aligned} & \text { temp } 0:=\text { temp } 0+\text { temp } 1 \\ & \text { write }(B, \text { temp } 0) \end{aligned}$ |

Figure 8: A concurrent nonserializable schedule of $T 0$ and $T 1$.

```
shared \(a[N, N]\);
constant \(N\);
local \(i, j, k\);
forall \((i=1 ; i \leq N)\) in parallel
    \(\boldsymbol{f o r}\left(k=0 ; \frac{N}{2} ; k++\right)\)
        \(\boldsymbol{\operatorname { f o r }}(j=0 ; N ; j:=j+2 * k)\)
        \(a[i, j]:=a[i, j]+a[i, j+k] ;\)
end sync
forall \((j=1 ; j \leq N)\) in parallel
    \(\boldsymbol{\operatorname { f o r }}\left(k=0 ; \frac{N}{2} ; k++\right)\)
        \(\boldsymbol{f o r}(i=0 ; N ; i:=i+2 * k)\)
        \(a[i, j]:=a[i, j]+a[i+k, j] ;\)
end sync
```

Figure 9: Parallel program $K 6$.

| $P_{1}$ | $P_{2}$ | $P_{3}$ | $P_{4}$ |
| :---: | :---: | :---: | :---: |
| $a[1,1]:=a[1,1]+a[1,2] ;$ | $a[2,1]:=a[2,1]+a[2,2] ;$ | $a[3,1]:=a[3,1]+a[3,2] ;$ | $a[4,1]:=\mathrm{a}[4,1]+a[4,2] ;$ |
| $a[1,3]:=a[1,3]+a[1,4] ;$ | $a[2,3]:=a[2,3]+a[2,4] ;$ | $a[3,3]:=a[3,3]+a[3,4] ;$ | $a[4,3]:=\mathrm{a}[4,4]+a[4,4] ;$ |
| $a[1,1]:=a[1,1]+a[1,3]$; | $a[2,1]:=a[2,1]+a[2,3] ;$ | $a[3,1]:=a[3,1]+a[3,3] ;$ | $a[4,1]:=\mathrm{a}[4,1]+a[4,3] ;$ |
| barrier synchronization; | barrier synchronization; | barrier synchronization; | barrier synchronization; |
| $a[1,1]:=a[1,1]+a[2,1] ;$ | $a[1,2]:=a[1,2]+a[2,2] ;$ | $a[1,3]:=a[1,3]+a[2,3] ;$ | $a[1,4]:=a[1,4]+a[2,4] ;$ |
| $a[3,1]:=a[3,1]+a[4,1] ;$ | $a[3,2]:=a[3,2]+a[4,2] ;$ | $a[3,3]:=a[3,3]+a[4,3] ;$ | $a[3,4]:=a[3,4]+a[4,4] ;$ |
| $a[1,1]:=a[1,1]+a[3,1] ;$ | $a[1,2]:=a[1,2]+a[3,2] ;$ | $a[1,3]:=a[1,3]+a[3,3] ;$ | $a[1,4]:=a[1,4]+a[3,4] ;$ |

Figure 10: Program K7.

| Producer | Consumer |
| :---: | :---: |
| shared buffer 0, buffer 1 ; <br> local temp 0 , temp 1 ; <br> semaphore full $=0$, empty $=1$; | shared buffer 0, buffer 1 ; local temp 0 , temp 1 ; semaphore full $=0$, empty $=1$; |
| ```repeat produce an item in temp0; produce an item in temp1; wait(empty); write(buffer0, temp0); write(buffer1,temp1); signal(full); until false:``` | ```repeat wait(full); read(buffer0, temp0); read(buffer1,temp1); signal(empty); consume the item in temp0; consume the item in temp1; untilfalse:``` |

Figure 11: K8: Producer and consumer programs.
is executed on a sequential processor and the system uses a protocol that orders the accesses to shared data as shown in Figure 12. In this case, the execution will be correct. However, not all possible interleavings of reads and writes yield correct result. Execution of consumer and producer processes must be synchronized to enforce restrictions on possible interleavings.

Consider the execution of the program in Figure 11 on a shared or distributed memory multiprocessor. Suppose the system uses semaphores, but the system allows pipelining of data accesses. If the program is executed on a shared memory multiprocessor, where the interconnection network between processors and memory modules is asynchronous, and if it is executed on a distributed memory multiprocessor, where the interconnection network between processors is asynchronous, then the following order of events is possible. After the producer issues requests to write into both buffers, it issues a signal request on semaphore ull. The signal request is performed. The consumer issues a wait request on semaphore full and issues read operations. Due to the asynchronous behavior of the interconnection network, the previous writes from the producer are

| Producer Process | Consumer Process |
| :---: | :---: |
| write(buffer 0, temp 0$) ;$ |  |
| write $($ buffer 1, temp 1$) ;$ | $\operatorname{read}($ buffer 0, temp 0$) ;$ |
| write $($ buffer 0, temp 0$) ;$ | $\operatorname{read}($ buffer 1, tem $p 1) ;$ |
| read $($ buffer 0, tem $p 0) ;$ |  |
| $\cdot$ | . |

Figure 12: An order of execution of reads and writes of programs $C 0$ and $C 1$ that results in correct execution.
still not performed. The read requests of the consumer are performed. The consumer reads incorrect values, and thus this execution is incorrect.

## 3 System Model

A concurrent execution involves a set of sequential processes, $\mathbf{P}=\left\{p_{1}, p_{2}, \ldots, p_{n}\right\}^{*}$, and a set of non-overlapping data structures called entities, $\mathbf{E}=\left\{e_{1}, e_{2}, \ldots, e_{m}\right\}$. A process is the execution of a sequential program, which consists of a finite sequence of operations. Processes communicate with each other through shared entities. There is only one valid version of an entity at any time. This means that if there are several copies of an entity, these copies are kept coherent. Accesses to entities are indivisible, which means that the effect of performing a read or write on an entity is equivalent to the case where the read and write are executed exclusively. We allow the granularity of an entity to be larger than one memory word. Note that an entity is not necessarily shared.

This model encompasses sequential and shared memory systems in which a memory word is an entity, client-server systems in which shared data of any size accessed through a server process is an entity, database systems in which shared items are entities, and message passing systems in which a message buffer is an entity. For message passing systems, a send operation can be viewed as a write operation on the message buffer of the receiver processes, and a receive operation as a read operation on the message buffer.

A process may use local data buffers to cache entities. Local data buffers are not shared among processes. For example, registers in sequential and tightly coupled shared memory programming can be viewed as local data buffers. Similarly, in client-server systems or databases, the variables in the address space of a process, in which an entity is buffered, or from which an entity is updated can be viewed as local data buffers.

In this paper, we make a simplifying assumption that the accesses to local data buffers are executed in the program order. This assumption can be relaxed either by modeling the local data buffers as entities, or by

[^1]entities $x, y, z:$ integer;
\[

$$
\begin{aligned}
& z:=x+y \\
& \text { if } x \geq 10 \text { then } \\
& x:=z-x
\end{aligned}
$$
\]

Figure 13: Program K 9.
deriving a more sophisticated system model. The latter issue is a future research topic.

### 3.1 Issuing and Performing an Operation

In order to develop a comprehensive theory, we distinguish between the actions of issuing an operation and performing an operation on an entity. Issuing a read or a write operation means that a request to perform the operation is made, whereas performing a read or a write operation means that the requested operation is serviced. We say that a read is serviced at the moment when the value it will return is fixed. Similarly, a write is serviced at the moment when a subsequent read can return the value written.

We are only interested in read and write operations, denoted by R and W respectively. Hence, we use the term operation only to refer to read and write operations. We use the notations $R_{i}^{j}(e)$ and $W_{i}^{j}(e)$ to denote that if the operations were executed in the program order, $j$ th operation of process $P_{i}$ would be a read and write operations on entity $e$, respectively. When no confusion arises, we will omit the subscript or the superscript.

### 3.2 Program Order

To simplify the presentation, we sometimes refer to the graph representation of a relation $R$ also as $R$, and to the underlying relation of a graph $G$ as $G$. The program order of a concurrent execution specifies the order in which entity accesses would have been performed if they were executed in the order specified by the concurrent program. The program order $I B_{i}$ for a process $P_{i}$ is a total order on the set $O_{i}$ of operations executed by process $P_{i}$. We also refer to $I B_{i}$ as the schedule of process $P_{i} . I B_{i}$ is analogous to the concept of trace defined in [9], and the concept of transaction in databases. To illustrate, Figure 14 displays the program order generated by executing program K9 in Figure 13, when initially $x \geq 10$, and Figure 16 displays the program order generated by executing program $K 10$ in Figure 15. In these figures, $I B$ is represented as the smallest relation of which transitive closure is $I B$. For a given schedule, we denote the $j$ th operation and the entity associated with this operation by operation $(j) \in\{W, R\}$ and $\operatorname{entity}\left(a_{j}\right) \in E$, respectively.

The system may provide constructs to allow programmers to specify an order among the operations of different processes. Barrier synchronization primitives [5] and conditional synchronization primitives [4] (e.g., semaphores, continue and delay operations in Concurrent PASCAL, and notify and wait operations in Mesa) are examples for such constructs. We define the relation $I B_{i n t e r}$ to express the order among the operations of different processes specified by the concurrent program. If $a$ and $b$ are operations of processes $P_{i}$ and $P_{j}$


Figure 14: $I B$ originated from the execution of program $K 9$, when $x \geq 10$ initially.

```
entities object: character[n];
local buffer: character[n];
local i,n : integer;
read(object,buffer,n);
for i=0 to i<n do
    buffer[i]:= buffer[i] + 1;
write(object,buffer,n);
```

Figure 15: Program K 10.


Figure 16: $I B$ originated from the execution of program $K 10$.


Figure 17: $I B$ for parallel program $K 7$
respectively $(i \neq j)$, and the concurrent program specifies that $b$ must be performed after $a$, then $a I B_{\text {inter }} b$. For example, consider again Figure 10. Operation $W_{1}(a[1,1])$ for the third statement of $P_{1}$, and operation $R_{3}(a[1,3])$ for the fourth statement of $P_{3}$. The program specifies that $W_{1}(a[1,1]) I B_{\text {inter }} R_{3}(a[1,3])$.

We denote the set of all operations of all the processes in a concurrent execution by $O$, namely $O=\bigcup_{i=1}^{n} O_{i}$. Program order relation $I B$, denoted by $\stackrel{i}{\rightarrow}$, is defined on $O$ as the transitive closure of $\bigcup_{i=1}^{n} I B_{i} \cup I B_{\text {inter }}$. IB is an irreflexive partial order on $O$. Note that there can be program orders which cannot be expressed by $I B$ relation. In figures in this paper, we depict $I B$ as the smallest relation of which transitive closure is $I B$. Figure 17 depicts $I B$ for parallel program in Figure 10.

If the operations are issued in the program order, then $I B_{i}$ specifies the order in which entity accesses are issued. This is not the case for some pipelined processors that issue operations in different order than the order defined in the code.

We assume that concurrent programs are written correctly for an ideal machine, and that the code generated by the compiler either preserves the order of entity accesses in each program, or if the compiler rearranges the
order of entity access in a program, it preserved the interprocess dependencies. Thus, if the entity accesses are executed in the order that is defined in the compiler generated code, the result will be correct. In pipelined processors, branches can affect the pipeline performance [12]. One method for reducing pipeline stalls due to branch delays is predicting branches. According to prediction, the branch is either taken or not taken before the branch condition is calculated. If the prediction is wrong, the state must be restored. For such architectures, we assume that pipeline stalls, if the shared data (entities) must be accessed in a predicted branch until branch condition is computed.

### 3.3 Performed Before Order

As we pointed out before, the order in which operations are performed can be different from the program order. We define the performed before relation $P B$, denoted by $\xrightarrow{p}$, on $O$ to capture the observable order, in which the accesses are performed, as follows:

1. If $a$ and $b$ are operations of the same process or different processes, and performing $b$ is delayed until $a$ is performed, then $a \xrightarrow{p} b$.
2. If an operation $R(x)$ in a process returns the value written by an operation $W(x)$ in the same process or different processes, then $W(x) \xrightarrow{p} R(x)$.
3. If an operation $W(x)$ in a process overwrites the value read by an operation $R(x)$ in the same process or different processes, then $R(x) \xrightarrow{p} W(x)$.
4. If an operation $W(x)$ in a process overwrites the value written by an operation $W(x)$ in the same process or different processes, then $W(x) \xrightarrow{p} W(x)$.
5. If $a \xrightarrow{p} b$ and $b \xrightarrow{p} c$, then $a \xrightarrow{p} c$.
$P B$ is an irreflexive partial order on $O$. In figures in this paper, we illustrate $P B$ as the smallest relation of which transitive closure is $P B$.

The rationale behind the definition of $P B$ is as follows. Item (1) expresses the order imposed due to either architectural assumptions or program specification. For example, some processors do not issue an operation until the previous operations are performed, or some pipelined processes do not issue operations that are dependent on previously issued operations until these operations are performed. Another example is the access ordering in bus-based shared memory multiprocessors. Since there is one FIFO path between a processor and all memory modules, then for any two operations $a$ and $b$ of a process, it is known that if $a \stackrel{i}{\rightarrow} b$, then $a \xrightarrow{p} b$. Yet another example is the fence operation in some shared memory multiprocessors, which allows to delay issuing, and hence, performing of an access until some previous accesses are performed [13]. Item (2) is due to causality. Items (3) and (4) are due to our assumption about the system that data is kept coherent. Item (5) simply expresses the transitivity of $P B$ relation.


Figure 18: IB order and $A$ relation for program $K 4$.

### 3.4 Atomic Actions

An atomic action is a sequence of operations whose execution is guaranteed to yield the same effect as if the operations were executed exclusively. An equivalence relation $A_{i}$ on $O_{i}$ for each process $P_{i}$ specifies the atomic actions. If $a$ and $b$ are operations of the same process, and $a$ and $b$ must performed atomically, then $a A_{i} b$. We define the equivalence relation $A$ on the set $O$ of operations of all processes: $A=\bigcup_{i=1}^{n} A_{i}$.

The following definitions, borrowed from [6], will be used later in the paper. If $R$ is an irreflexive relation and $A$ is an equivalence relation on a set $U$, then $R / A$ is an irreflexive relation induced by $R$ on the set of equivalence classes $U / A . R$ is defined as

$$
R / A=\left\{\left(r_{1}, r_{2}\right) \mid r_{1} \in U / A \wedge r_{2} \in U / A \wedge r_{1} \neq r_{2} \wedge\left(\exists a \exists b: a \in r_{1}, b \in r_{2}: a R b\right)\right\}
$$

Hence, $P B / A$ and $I B / A$ are the irreflexive relations induced by $P B$ and $I B$ on the set of equivalence classes $O / A$. We assume that $I B / A$ is a partial order, and $I B / A$ specifies the program order among the atomic actions in a concurrent execution. In figures that depict $I B$, we illustrate the operations in an atomic action within a box. Consider again concurrent program $K 4$ of Figure 4. The $I B$ order for program $K 4$ is depicted in Figure 18 , where $A_{1}=\left\{\left(R_{1}(x), W_{1}(x)\right),\left(R_{1}(y), W_{1}(y)\right)\right\}$ and $A_{2}=\left\{\left(R_{2}(x), W_{2}(x)\right),\left(R_{2}(y), W_{2}(y)\right)\right\}$.

### 3.5 Data Dependence

We say that a write operation $W_{i}^{k}(x)$ is dependent on an entity $y$ (where $y$ can be equal to $x$ ), if the value of entity $y$ is used to compute the value of $x$. We say that $W_{i}^{k}(x)$ is dependent on read operation $R_{i}^{l}(y)$, if it is dependent on $y$ and $R_{i}^{l}(y)$ is is the first read on $y$ before $W_{i}^{k}(x)$ in $I B_{i}$. Note that we have defined dependency within a process.

To illustrate, consider program $K 9$ in Figure 13 and the corresponding schedule in Figure 14. $W^{3}(z)$ is dependent on entities $x$ and $y$ and on operations $R^{1}(x)$ and $R^{2}(y)$, whereas $W^{7}(x)$ is dependent on entities $x$ and $z$ and on operations $R^{5}(x)$ and $R^{6}(z)$. For program $K 10$ in Figure 15 and its schedule in Figure 16, $W$ (object) is dependent on entity object and on operation $R$ (object).

We assume that the system stalls before issuing a write operation until all reads that write is dependent are performed. Hence, for any write operation $a$ that is dependent on a read operation $b, b \xrightarrow{p} a$.

### 3.6 Interpretation

We borrow the notion of interpretation from [7]. The interpretation of a schedule ( $I B_{i}$ ) is specified by the program of process $P_{i}$ from which the schedule is originated. An interpretation of a schedule is a pair $I_{i}=$ $(D, F)$, where $D=\left\{D_{x}, D_{y}, \ldots\right\}$ is a set of domains, one for each entity in $\mathbf{E}$; each domain is a set of values for the corresponding entity. $F$ is a set of functions, one for each write operation in $I B_{i}$ and is defined as:

$$
F=\left\{f_{j} \mid j \text { is a step of the schedule and operation }(j)=W\right\}
$$

For each such step $j, f_{j}$ is a mapping

$$
f_{j}: \prod_{x \in B(j)} D_{x} \rightarrow D_{\operatorname{entity}(j)}
$$

where

$$
B(j)=\left\{x \in E \mid W^{j}(\operatorname{entity}(j)) \text { is dependent on } x\right\} .
$$

We illustrate this concept by an example. Consider again program K9 in Figure 13. Figure 14 depicts the schedule generated by executing program $K 9$, when initially $x \geq 10$. The interpretation $I_{i}=(D, F)$ of the schedule corresponding to program $K 9$ is the following. The domains $D_{x}, D_{y}, D_{z}$ for the entities $x, y, z$ are the set of integers, and the functions corresponding to the write operations are $f_{3}(x, y)=x+y$ and $f_{7}(z, x)=z-x$.

### 3.7 Concurrent Execution, Correct Execution and Execution Model

Two relations $R_{1}$ and $R_{2}$ are said to be consistent, if $R_{1} \cup R_{2}$ can be extended to a total ordering. A relation can be extended to a total ordering if and only if its transitive closure is irreflexive. Thus, $R_{1}$ is consistent with $R_{2}$ if and only if $R_{1} \cup R_{2}$ is acyclic.

A concurrent execution $s$ is represented by a tuple $\langle C, P B\rangle$, where $C$ is a tuple $\langle O, I, A, I B\rangle$ that specifies the correct execution. We refer to $C$ as the correctness criterion or the specification of the correct execution. $O$ is the set of entity accesses of all processes in the concurrent execution. $A$ is the equivalence relation on $O$ that defines the atomic actions. $I B$ and $P B$ are the program and performing orders on the entity accesses $O$, respectively. $I$ is the set consisting of the union of all interpretations of all schedules in the concurrent execution.

We have assumed that programs are written correctly for the ideal system. Hence, we know that an execution in which operations are performed in the program order, and in which atomic actions are executed exclusively is correct. Formally, we can define the correct execution as follows.

Definition 3.1 For a given specification $C=\langle O, I, A, I B>$, an execution $s=<C, P B>$ is correct if the result of $s$ is equal to the result of any execution in the set of executions $X$, where

$$
\begin{array}{rlrl}
X=\left\{\left\langle C, P B_{c}\right\rangle \mid\right. & I B \subseteq P B_{c} & \wedge \\
& P B_{c} \text { is consistent with IB } & \hat{} \\
& I B / A \subseteq P B_{c} / A & \wedge \\
& P B_{c} / A \text { is consistent with } I B / A
\end{array}
$$

An execution model describes an execution order on operations of processes, such that the result of an execution, which adheres this model, is the same as if the operations were executed in this order. An execution model can be specified by a type of atomic constraints and a type program order constraints. Execution $s=\langle C, P B\rangle$, which adheres the model, is correct, if the constraints of the model match the ones in the specification $C$ of the correct execution. Let $A_{e}$ be the equality relation. For example, sequential consistency is an execution model in which atomic constraints are specified by $A_{e}$ and program order constraints are specified by $\bigcup_{i=1}^{n} I B_{i}$, where $I B_{i}$ is the program order of process $P_{i}$.

Definition 3.2 For a given specification $C=\langle O, I, A, I B\rangle$, an execution $s=<C, P B\rangle$ is sequentially consistent, if the result of $s$ is equal to the result of a sequential execution defined as

$$
\begin{array}{rll}
s_{c}=\left\{\left\langle C_{c}, P B_{c}\right\rangle \mid\right. & \left.C_{c}=<O, I, \bigcup_{i} I B_{i}, A_{e}\right\rangle & \wedge \\
& \bigcup_{i} I B_{i} \subseteq P B_{c} \\
& P B_{c} \text { is consistent with } \bigcup_{i} I B_{i}
\end{array}
$$

Let $A_{s}$ be the equivalence relation defined as follows: $a A_{s} b$, if and only if $a$ and $b$ are operations of the same process.

Definition 3.3 For a given specification $C=\langle O, I, A, I B\rangle$, an execution $s=<C, P B\rangle$ is serializable, if the result of $s$ is equal to the result of a serial execution defined as

$$
\begin{array}{rlr}
s_{s}=\left\{\left\langle C_{s}, P B_{s}>\right|\right. & \left.C_{s}=<O, I, \bigcup_{i} I B_{i}, A_{s}\right\rangle & \wedge \\
& \bigcup_{i} I B_{i} \subseteq P B_{s} \\
& P B_{s} \text { is consistent with } \bigcup_{i} I B_{i} & \wedge \\
& \left(\bigcup_{i} I B_{i}\right) / A_{s} \subseteq P B_{s} / A_{s} & \wedge \\
& P B_{s} / A_{s} \text { is consistent with }\left(\bigcup_{i} I B_{i}\right) / A_{s} & \wedge
\end{array}
$$

## 4 Correctness, Sequential Consistency, Serializability

Our theory encompasses sequentially consistent executions of programs. Sequential consistency is the typical execution model for multiprocessor systems. The notion of sequential consistency was defined in [1] as follows:
" The result of any execution is the same as if the operations of all the processors were executed in some sequential order, and the operations of each individual processor appear in this sequence in the order specified by its program." In other words, an execution is sequentially consistent, if its result is equivalent to a sequential execution. For a given program, if $A$ is the equality relation and $I B_{\text {inter }}=\emptyset$, then a sequentially consistent execution is correct.

For applications in which $A$ and $I B_{\text {inter }}$ are specified differently, sequential consistency is not sufficient to ensure correctness. These application domains include those programs which are specified as a sequence of atomic segments, each of which is a sequence of indivisible operations (e.g., programs in Figures 4 and 5) and
those programs in which there is a specific order specified among the operations of different processes (eg., programs in Figures 10 and 11). A sequentially consistent execution of such programs may yield incorrect results.

Our theory encompasses serializable executions of programs. Serializability is the typical execution model that is used in database systems. Serializability is defined as follows. The result of any execution is the same as if the processes (transactions) were executed in some serial order. In other words, an execution is serializable, if its result is equivalent to a serial execution. If $A$ is equal to relation $A_{s}$ defined in Section 3.7, and $I B_{\text {inter }}=\emptyset$, then a serializable execution is correct.

For applications in which $I B_{\text {inter }}$ is not an empty set, serializability is not sufficient to ensure correctness. These application domains include those programs in which there is a specific order specified among the operations of different processes (eg., programs in Figures 10 and 11). A serializable execution of such programs may yield incorrect results. Furthermore, for applications, in which $I B_{\text {inter }}=\emptyset$, but $A$ is defined differently, serializability degrades the performance unnecessarily. These application domains include those programs which are specified as a sequence of atomic segments (e.g., program in Figures 4).

In addition to sequential consistency and serializability, one can define other execution models with different atomic constraints and program order constraints. Our theory allows us to deal with any execution model, in which atomic constraints can be expressed as an equivalence relation and program order constraints as a partial order. Our goal is to derive algorithms to test whether an execution is correct or meets a given execution model. To this end, we develop different notions of equivalence, and propose extensions to some of the concepts in serializability theory. We make three extensions to read-write model of transactions. These extension also require that changes be made to the various testing algorithms for serializability [2, 7]. Since serializability is a special case of our general correctness criterion, the testing algorithms that will be presented in Section 7 also cover the necessary changes to the testing algorithms for serializability with the following extensions.

1. We remove the restriction in the transaction model that each transaction reads and writes an entity at most once, and if a transaction reads and writes an entity $x, W(x)$ follows $R(x)$. Hence, the schedule of a process can have more than one read and write in any order.
2. We remove the restriction in the transaction model that if a transaction issues operation $a$ before operation $b$, then $a$ is executed before $b$.
3. We assume a write is dependent only on a subset of previous reads in the schedule. The subset of reads is defined by the dependencies. In serializability, a write is assumed to be dependent on all the previous reads in the schedule [7]. Hence, we define the equivalence classes for a set of interpretations that result in the same dependencies, whereas in serializability, the equivalence classes are defined for all possible interpretations of an schedule. The latter definition is more conservative in the sense that the set of correct executions accepted under this definition is a subset of correct executions accepted under our definition.


Figure 19: $I B$ originated from an execution of $K 2$.

```
entities }x,y\mathrm{ : integer;
x:= x+10;
y:= y+10;
```

Figure 20: Program K11.

The first extension allows us to capture typical concurrent programs in which processes interact through reading and writing shared entities. This is in contrast to the transaction model, in which the goal is to execute each transaction in isolation. For example, consider Peterson's solution to two-process critical section problem as shown in Figure 2. In Figure 19, the schedules originated from an execution of program $K 2$ are depicted. In this execution, $F 0$ and $F 1$ are false initially, and process $P_{0}$ reads $F 0$ before process $P_{1}$ starts executing. Entities $F 0$ and $F 1$ are written twice in $I B_{0}$ and $I B_{1}$, respectively, and entity turn is read after it is written in both schedules.

The second extension is introduced to allow the pipelining of operations, whereas the third extension is introduced to increase pipelining. For example, consider program $K 11$ in Figure 20 and its $I B$ in Figure 21. In our model, $W(y)$ is dependent only on $R(y)$, whereas in serializability theory, $W(y)$ is assumed to be dependent on both $R(y)$ and $R(x)$. If pipelining is allowed, $W(y)$ can be issued or performed before $R(x)$ is issued or performed according to our theory, whereas $W(y)$ can only be issued after both $R(x)$ and $R(y)$ are performed according to the serializability theory.

In the remainder of this paper, we only refer to an atomic action which contains more than one operation as an atomic action. $O_{i} / A_{i}-O_{i} / A_{e}$ is the set of atomic actions in process $P_{i}$. We denote this set by $A A_{i}$. Hence, $A A_{i}=O_{i} / A_{i}-O_{i} / A_{e}=\left\{O_{i}^{1}, O_{i}^{2}, \ldots\right\}$, where $O_{i}^{j}$ is the set of operations in atomic action $j$ in process $P_{i}$. The set of all atomic actions is denoted by $A A$, where $A A=\bigcup_{i} A A_{i}$.


Figure 21: $I B$ originated from execution of $K 11$.

## 5 Classes of Correct Executions

In this section, we present three different notions of equivalence to categorize correct executions and executions that meet a given execution model into classes. The containment relation between these classes is in terms of the restriction placed on the access order.

### 5.1 View Correctness

We redefine the notion of view equivalence used in serializability theory [2] to capture the case where a process can issue several read and write operations on the same entity in any order, and operations can be performed in an order different from the order in which they are issued.

Definition 5.1 Two executions $s_{1}=<C_{1}, P B_{1}>$ and $s_{2}=<C_{2}, P B_{2}>$ are view equivalent, if $O_{1}=O_{2}$, $I_{1}=I_{2}$, and

1. for each entity $x$, if $R_{i}^{j}(x)$ returns the initial value of $x$ in execution $s_{1}$, then $R_{i}^{j}(x)$ must return the initial value of $x$ in execution $s_{2}$, and
2. for each entity $x$, if $R_{i}^{j}(x)$ returns the value written by $W_{k}^{l}(x)$ in execution $s_{1}$, then $R_{i}^{j}(x)$ must return the value written by $W_{k}^{l}(x)$ in execution $s_{2}$, and
3. for each entity $x$, if $W_{i}^{j}(x)$ writes the final value of $x$ in execution $s_{1}$, then $W_{i}^{j}(x)$ must write the final value of $x$ in execution $s_{2}$, and
4. for each entity $x$, if $W_{i}^{j}(x)$ is dependent on entity $y$ and $R_{i}^{k}(y)$ is the first read performed on $y$ before $W_{i}^{j}(x)$ in execution $s_{1}$, then $R_{i}^{k}(y)$ must be the first read performed on $y$ before $W_{i}^{j}(x)$ in execution $s_{2}$.

Definition 5.2 For a given $C$, let $X$ be the set of executions as defined in Section 3.7.
Execution $s=<C, P B>$ is view correct, if it is view equivalent to an execution in $X$.
Definition 5.3 An execution is view consistent, if it is view equivalent to a sequential execution.
View consistency should not be confused with view serializability. An execution is view serializable, if it is view equivalent to a serial execution.

### 5.2 B Correctness

We will prove in Section 7.1 that testing for view correctness is an NP-complete problem. Therefore, we must search for other classes of correct executions that restrict the access order more than view correct executions. For this purpose, we define a new equivalence notion- B equivalence. In Section 7.2 , we will present a polynomial time testing algorithm for B correctness.

Definition 5.4 Two executions $s_{1}=<C_{1}, P B_{1}>$ and $s_{2}=<C_{2}, P B_{2}>$ are $B$ equivalent, if

1. $s_{1}$ and $s_{2}$ are view equivalent, and
2. for each entity $x$, if $W_{i}^{j}(x)$ is performed before $W_{k}^{l}(x)$, and if $R_{p}^{q}(x)$ returns the value written by $W_{k}^{l}(x)$ in execution $s_{1}$, then $W_{i}^{j}(x)$ must be performed before $W_{k}^{l}(x)$ in $s_{2}$, and
3. for each entity $x$, if $W_{i}^{j}(x)$ is performed after $R_{p}^{q}(x)$ in execution $s_{1}$, then $W_{i}^{j}(x)$ must be performed after $R_{p}^{q}(x)$ in $s_{2}$.

Definition 5.5 For a given $C$, let $X$ be the set of executions as defined in Section 3.7.
Execution $s=<C, P B>$ is $B$ correct, if $s$ is $B$ equivalent to an execution in $X$.
Definition 5.6 An execution s is $B$ consistent, if s is $B$ equivalent to a sequential execution.

Definition 5.7 An execution s is $B$ serializable, if $s$ is $B$ equivalent to a serial execution.
Theorem 5.1 If an execution $s$ is $B$ correct, then $s$ is view correct.
The converse of Theorem 5.1 is not true. Similarly, if an execution $s$ is B consistent, then $s$ is view consistent, and if an execution $s$ is B serializable, then $s$ is view serializable, and the converses of these statements are not correct.

### 5.3 Conflict Correctness

Although B correctness yields a polynomial time testing algorithm, for completeness, we introduce another correctness class-conflict correctness, which also yields a polynomial time testing algorithm, but restricts the access order more than B correctness. Conflict equivalence is widely used in serializability theory [2] and in optimization techniques for parallel programs [6].

Two operations are said to be conflicting, if they access the same entity and at least one of them is a write. We extend the definition of conflict equivalence defined for databases [2].

Definition 5.8 Two executions $s_{1}=<C_{1}, P B_{1}>$ and $s_{2}=<C_{2}, P B_{2}>$ are conflict equivalent, if 1. $s_{1}$ and $s_{2}$ are $B$ equivalent, and
2. for each entity $x$, if $W_{k}^{l}(x)$ overwrites the value returned by $R_{i}^{j}(x)$ in execution $s_{1}$, then $W_{k}^{l}(x)$ must overwrite the value returned by $R_{i}^{j}(x)$ in execution $s_{2}$.
3. for each entity $x$, if $W_{k}^{l}(x)$ overwrites the value written by $W_{i}^{j}(x)$ in execution $s_{1}$, then $W_{k}^{l}(x)$ must overwrite the value written by $W_{i}^{j}(x)$ in execution $s_{2}$.

Definition 5.9 For a given $C$, let $X$ be the set of executions as defined in Section 3.\%.
Execution $s=<C, P B>$ is conflict correct, if it is conflict equivalent to an execution in $X$.
Definition 5.10 An executions is conflict consistent, if $s$ is conflict equivalent to a sequential execution.
Conflict consistency should not be confused with conflict serializability. An execution is conflict serializable, if it is conflict equivalent to a serial execution.

Theorem 5.2 If an execution is conflict correct, then it is $B$ correct.
The converse of Theorem 5.2 is not true. Similarly, if an execution $s$ is conflict consistent, then $s$ is B consistent, and if an execution $s$ is conflict serializable, then $s$ is B serializable, and converses of these statements are not correct.

## 6 Hierarchical Graphs and Polygraphs

In order to reduce the complexity of testing for correctness, we define the concepts of hierarchical graph and hierarchical polygraphs. Informally, a hierarchical graph is a graph in which some nodes are themselves graphs. A hierarchical polygraph is a polygraph (polygraphs are defined in [7]), in which some nodes are themselves polygraphs. Each hierarchical graph represent a graph, which we refer as the underlying graph.

Definition 6.1 A hierarchical graph (or higraph) is a tuple $\mathcal{G}=\left(V_{1}, V_{0}, E\right) . V_{1}$ is the set of supernodes. Each supernode is a graph. Hence,

$$
V_{1}=\left\{\left(V_{1 i}, E_{1 i}\right)\right\}
$$

$V_{0}$ is the set of nodes, and $E$ is the set of edges defined by a binary relation on $V_{1} \cup V_{0}$.
In figures, we depict a higraph $\mathcal{G}=\left(V_{1}, V_{0}, E\right)$ with a set of graphs. The set includes graph $G^{\prime}=\left(V_{1} \cup\right.$ $\left.V_{0}, E\right)$, in which nodes corresponding to supernodes are drawn in black, and a graph per supernode, which is drawn in a circle. Figure 22 illustrates a higraph. In this higraph, the set of supernodes is $V_{1}=\left\{s n_{1}, s n_{2}\right\}$, where $s n_{1}=\left(V_{11}, E_{11}\right), V_{11}=\left\{n_{1}, n_{2}\right\}, E_{11}=\emptyset . s n_{2}=\left(V_{12}, E_{12}\right), V_{12}=\left\{n_{3}, n_{4}, n_{5}, n_{6}\right\}$, and $E_{12}=$ $\left.\left(n_{3}, n_{5}\right),\left(n_{4}, n_{5}\right),\left(n_{5}, n_{6}\right)\right\}$. The set of nodes $V_{0}$ is empty. The set of edges is $E=\left\{\left(s n_{1}, s n_{2}\right)\right\}$.

To simplify the definition of the underlying graph of a higraph, we define the function parent for a higraph from the set $\bigcup_{i} V_{1 i} \cup V_{0}$ to the set $V_{1} \cup V_{0}$ as follows:

$$
\operatorname{parent}(n)= \begin{cases}s n & \text { if }\left(\exists s n: s n \in V_{1}: n \in s n\right) \\ n & \text { otherwise }\end{cases}
$$



Figure 22: Higraph $\mathcal{G}_{1}$.

$n_{6}$

Figure 23: $G$ is the underlying graph of higraph $\mathcal{G}_{1}$.

Definition 6.2 A higraph $\mathcal{G}=\left(V_{1}, V_{0}, E\right)$ represents an underlying graph $G=\left(V, E^{\prime}\right)$, where

$$
\begin{gathered}
V=\bigcup_{i} V_{1 i} \cup V_{0}, \\
E^{\prime}=\bigcup_{i} E_{1 i} \cup E^{\prime \prime}, \\
E^{\prime \prime}=\left\{\left(n_{i}, n_{j}\right) \mid n_{i} \in V \wedge n_{j} \in V \wedge\left(\operatorname{parent}\left(n_{i}\right) \neq \operatorname{parent}\left(n_{j}\right)\right) \wedge\left(\left(\operatorname{parent}\left(n_{i}\right), \operatorname{parent}\left(n_{j}\right)\right) \in E\right)\right\} .
\end{gathered}
$$

Figure 23 illustrates the underlying graph of higraph $\mathcal{G}_{1}$ depicted in Figure 22. For higraph $\mathcal{G}_{1}, \operatorname{parent}\left(n_{1}\right)=$ $\operatorname{parent}\left(n_{2}\right)=s n_{1}$ and parent $\left(n_{3}\right)=\operatorname{parent}\left(n_{4}\right)=\operatorname{parent}\left(n_{5}\right)=\operatorname{parent}\left(n_{6}\right)=s n_{2}$. For the underlying graph $G, E^{\prime \prime}=\left\{\left(n_{1}, n_{3}\right),\left(n_{1}, n_{4}\right),\left(n_{1}, n_{5}\right),\left(n_{1}, n_{6}\right),\left(n_{2}, n_{3}\right),\left(n_{2}, n_{4}\right),\left(n_{2}, n_{5}\right),\left(n_{2}, n_{6}\right)\right\}$.

Definition 6.3 A hierarchical polygraph (or hipolygraph) is a tuple $\mathcal{P}=\left(V_{1}, V_{0}, E, C\right) . V_{1}$ is the set of supernodes. Each supernode is a polygraph. Hence,

$$
V_{1}=\left\{\left(V_{1 i}, E_{1 i}, C_{1 i}\right)\right\}
$$



Figure 24: A hipolygraph $\mathcal{P}_{1}$.


Figure 25: A higraph that is compatible with hipolygraph $\mathcal{P}_{1}$.
$V_{0}$ is the set of nodes, and $E$ is the set of edges defined by a binary relation on $V_{1} \cup V_{0} . C$ is the set of choices defined on $V_{1} \cup V_{0}$.

In figures, we depict a hipolygraph $\mathcal{P}=\left(V_{1}, V_{0}, E, C\right)$ with a set of polygraphs. The set includes polygraph $P^{\prime}=\left(V_{1} \cup V_{0}, E, C\right)$, in which nodes corresponding to supernodes are drawn in black, and a polygraph per supernode, which is drawn in a circle. Figure 24 illustrates a hipolygraph. In this hipolygraph, the set of supernodes is $V_{1}=\left\{s n_{1}, s n_{2}\right\}$, where $s n_{1}=\left(V_{11}, E_{11}, C_{11}\right), V_{11}=\left\{n_{1}, n_{2}\right\}, E_{11}=\left\{\left(n_{1}, n_{2}\right)\right\}, C_{11}=\emptyset$, $s n_{2}=\left(V_{12}, E_{12}, C_{12}\right), V_{12}=\left\{n_{4}, n_{5}, n_{6}\right\}, E_{12}=\left\{\left(n_{4}, n_{5}\right)\right\}$, and $C_{12}=\left\{\left(n_{4}, n_{6}, n_{5}\right)\right\}$. The set of nodes is $V_{0}=\left\{n_{3}\right\}$, the set of edges $E=\left\{\left(s n_{1}, s n_{2}\right)\right\}$, and the set of choices $C=\left\{\left(n_{3}, s n_{1}, s n_{2}\right)\right\}$. The function parent is defined the same way for a hipolygraph. For hipolygraph $\mathcal{P}_{1}, \operatorname{parent}\left(n_{1}\right)=\operatorname{parent}\left(n_{2}\right)=s n_{1}$, $\operatorname{parent}\left(n_{3}\right)=n_{3}$, and parent $\left(n_{4}\right)=\operatorname{parent}\left(n_{5}\right)=\operatorname{parent}\left(n_{6}\right)=s n_{2}$.

Definition 6.4 Higraph $\mathcal{G}=\left(V_{1}, V_{0}, E\right)$ is said to be compatible with hipolygraph $\mathcal{P}=\left(V^{\prime}{ }_{1}, V_{0}, E^{\prime}, C^{\prime}\right)$, if

1. $E^{\prime} \subseteq E$, and for each choice $\left(c_{1}, c_{2}, c_{3}\right)$ in $C^{\prime}$, either edge $\left(c_{1}, c_{2}\right)$ or edge $\left(c_{2}, c_{3}\right)$ is in $E$, and
2. $E_{1 i}^{\prime} \subseteq E_{1 i}$, and for each choice $\left(c_{1}, c_{2}, c_{3}\right)$ in $C^{\prime}{ }_{1 i}$, either edge $\left(c_{1}, c_{2}\right)$ or edge $\left(c_{2}, c_{3}\right)$ is in $E_{1 i}$.

Figure 25 illustrates a higraph that is compatible with hipolygraph $\mathcal{P}_{1}$ depicted in Figure 24.

## Definition 6.5

1. A supernode is said to be acyclic, if the graph that the supernode contains is acyclic.
2. A higraph is said to be acyclic, if the underlying graph is acyclic.
3. A hipolygraph is said to be acyclic, if there is a compatible higraph which is acyclic.
4. A supernode is said to be a total order, if the graph that the supernode contains is a total order.
5. A higraph is said to be a total order, if the underlying graph is a total order.

Theorem 6.1 A higraph $\mathcal{G}=\left(V_{1}, V_{0}, E\right)$ is acyclic, if and only if each supernode in $V_{1}$ is acyclic, and graph $G^{\prime}=\left(V_{1} \cup V_{0}, E\right)$ is acyclic.

## 7 Testing Algorithms

In this section, we develop algorithms for testing view correctness, B correctness and conflict correctness. These algorithms can also be used to test whether an execution meets a given execution model by replacing the atomic constraints and program order of the specification of correct execution by the ones of the execution model.

### 7.1 View Correctness

In order to test whether an execution $s$ is view correct, we define a directed hipolygraph $\mathcal{P}(s)$. An augmented execution $\hat{s}$ of an execution $s$ contains two new processes $P_{b}$ and $P_{f}$, besides those in $s$. $P_{b}$ consists of only write steps, one for each entity read or written in $s . P_{f}$ consists of only read operations one for each entity read or written in $s$. Execution $\hat{s}$ starts with $P_{b}$ and ends with $P_{f}$. Given an execution $s$, hipolygraph $\mathcal{P}(s)=\left(V_{1}, V_{0}, E, C\right)$ is constructed as follows. In $\mathcal{P}(s)$, there are two nodes that represent $P_{b}$ and $P_{f}$ respectively, and one node for each operation in $s$ which is not in an atomic action. Hence, $V_{0}=\left\{P_{b}, P_{f}\right\} \cup O-\bigcup_{S \in A A} S$. Each supernode corresponds to an atomic action. The set of supernodes is $V_{1}=\left\{\left(O_{1}^{1}, E_{1}^{1}, C_{1}^{1}\right),\left(O_{1}^{2}, E_{1}^{2}, C_{1}^{2}\right), \ldots,\left(O_{2}^{1}, E_{2}^{1}, C_{2}^{1}\right), \ldots,\right\}$, where $O_{i}^{j}$ is the operations in atomic action $j$ of process $P_{i}$. There are six types of directed edges in $E_{i}^{j}$.

1. For each pair of operations $a$ and $b$ in atomic action $j$ in process $P_{i}$, if $a$ is immediately before $b$ in program order $I B$, then the $\operatorname{arc}(a, b)$ is added to $E_{i}^{j}$.
2. For each entity $x$, if $R_{i}^{k}(x)$ and $W_{i}^{l}(x)$ are operations in atomic action $j$ in process $P_{i}$, and $R_{i}^{k}(x)$ returns the value written by $W_{i}^{l}(x)$, then $\left(W_{i}^{l}(x), R_{i}^{k}(x)\right)$ is added to $E_{i}^{j}$.
3. For each entity $x$, if $R_{i}^{k}(x)$ and $W_{i}^{m}(x)$, are operations in atomic action $j$ in process $P_{i}, R_{i}^{k}(x)$ returns the value written by $W_{a}^{l}(x)$ that is not in the same atomic action, then edge ( $\left.R_{i}^{k}(x), W_{i}^{m}(x)\right)$ is added to $E_{i}^{j}$.
4. For each entity $x$, if $W_{i}^{l}(x)$ and $W_{i}^{m}(x)$, are operations in atomic action $j$ in process $P_{i}, R_{a}^{k}(x)$ that is not in the same atomic action returns the value written by $W_{i}^{l}(x)$, then edge ( $\left.W_{i}^{m}(x), W_{i}^{l}(x)\right)$ is added to $E_{i}^{j}$.
5. For each entity $x$, if there is $R_{i}^{k}(x)$ in atomic action $j$ in process $P_{i}$ that returns the initial value of $x$, and there is a $W_{i}^{l}(x)$ in the same atomic action, then edge $\left(\left(R_{i}^{k}(x), W_{i}^{l}(x)\right)\right.$ is added to $E_{i}^{j}$.
6. For each entity $x$, if there is $W_{i}^{k}(x)$ in atomic action $j$ in process $P_{i}$ that writes the final value of $x$, and there is a $W_{i}^{l}(x)$ in the same atomic action, then edge $\left.\left(W_{i}^{l}(x)\right), W_{i}^{k}(x)\right)$ ) is added to $E_{i}^{j}$.

For each entity $x$, if $R_{i}^{k}(x), W_{i}^{l}(x)$, and $W_{i}^{m}(x)$, are operations in atomic action $j$ in process $P_{i}$, and $R_{i}^{k}(x)$ returns the value written by $W_{i}^{l}(x)$, then choice $\left(R_{i}^{k}(x), W_{i}^{m}(x), W_{i}^{l}(x)\right)$ is added to $C_{i}^{j}$.

There are six types of directed edges in $E$.

1. For each operation $a$ in any process in $s$, the $\operatorname{arc}\left(P_{b}, \operatorname{parent}(a)\right)$ is added to $E$.
2. For each operation $a$ in in $s$, the arc $\left(\operatorname{parent}(a), P_{f}\right)$ is added to $E$.
3. For each pair of operations $a$ and $b$ in any processes in $s$, if parent $(a) \neq \operatorname{parent}(b)$, and $a$ is before $b$ in program order $I B$, and there is no other operation $c$ in any process in $s$, such that parent $(a) \neq$ $\operatorname{parent}(c) \neq \operatorname{parent}(b)$, and $a$ is before $c$ and $c$ is before $b$ in $I B$, then the arc (parent $(a)$, parent $(b))$ is added to $E$.
4. For each entity $x$, if $R_{i}^{j}(x)$ in any process in $s$ returns the value written by $W_{k}^{l}(x)$ in any process in $s$, and $\operatorname{parent}\left(R_{i}^{j}(x)\right) \neq \operatorname{parent}\left(W_{k}^{l}(x)\right)$, then edge $\left(\operatorname{parent}\left(W_{k}^{l}(x), \operatorname{parent}\left(R_{i}^{j}(x)\right)\right.\right.$ is added to $E$.
5. For each entity $x$, if in any process in $s$, there is a read operation $R_{i}^{j}(x)$ that returns the initial value of $x$, and there is a write operation $W_{k}^{l}(x)$ in any process in $s$, and $\operatorname{parent}\left(R_{i}^{j}(x)\right) \neq \operatorname{parent}\left(W_{k}^{l}(x)\right)$, then edge $\left(\operatorname{parent}\left(R_{i}^{j}(x)\right)\right.$, parent $\left.\left(W_{k}^{l}(x)\right)\right)$ is added to $E$.
6. For each entity $x$, if in any process in $s$, there is a write operation $W_{i}^{j}(x)$ that writes the final value of $x$, and there is another write operation $W_{k}^{l}(x)$ in any process in $s$, and parent $\left(W_{i}^{j}(x)\right) \neq \operatorname{parent}\left(W_{k}^{l}(x)\right)$, then edge $\left(\operatorname{parent}\left(W_{k}^{l}(x)\right), \operatorname{parent}\left(W_{i}^{j}(x)\right)\right)$ is added to $E$.

The set of directed choices $C$ is constructed as follows: For each entity $x$ and operations $R_{i}^{j}(x), W_{k}^{l}(x)$, and $W_{c}^{a}(x)$ in any processes in $s$, such that $R_{i}^{j}(x)$ returns the value written by $W_{k}^{l}(x)$, then

$$
\left(\operatorname{parent}\left(R_{i}^{j}(x)\right), \text { parent }\left(W_{c}^{a}(x)\right), \operatorname{parent}\left(W_{k}^{l}(x)\right)\right)
$$

is added to the set of choices $C$, if

1. $\operatorname{parent}\left(R_{i}^{j}(x)\right)=\operatorname{parent}\left(W_{k}^{l}(x)\right) \neq \operatorname{parent}\left(W_{c}^{a}(x)\right)$, or
2. $\operatorname{parent}\left(R_{i}^{j}(x)\right) \neq \operatorname{parent}\left(W_{c}^{a}(x)\right) \neq \operatorname{parent}\left(W_{k}^{l}(x)\right) \neq \operatorname{parent}\left(R_{i}^{j}(x)\right)$.

Theorem 7.1 An execution $s$ is view correct if and only if $\mathcal{P}(s)$ is acyclic.

Theorem 7.2 The problem of deciding whether an execution is view correct is NP-complete.

| $P_{1}$ |  | $P_{2}$ |
| :--- | :--- | :--- |
| entities $x 1, x 2, y 1:$ integer; <br> local $t:$ integer; |  | entities $x, y:$ integer; <br> local $t:$ integer; |
| $t:=y 1 ;$ |  | $t:=y 2 ;$ |
| $x 1:=x 1-t ;$ |  | $x 1:=x 1+t ;$ |
| barrier synchronization; |  | barrier syncronization; |
| $x 2:=x 1-x 2 ;$ |  | $x 2:=x 1+x 2 ;$ |

Figure 26: Program $K 12$ that is written with the assumption that each increment statement is atomic.


Figure 27: $s_{12}$ : An execution of program $K 12$.

In Figure 26, we define a parallel program $K 12$ in which barrier synchronization is used to order operations of two processes, and which is written with the assumption that increment statements are atomic. Figure 27 illustrates an execution $s_{12}$ of program $K 12$, and in Figure 28, we show the hipolygraph $\mathcal{P}\left(s_{12}\right)$ corresponding to execution $s_{12}$. The hipolygraph is cyclic, therefore the execution is not view correct. Figure 29 illustrates execution $s_{13}$, which is view correct. Note that if an execution $s=<C, P B>$ does not contain any atomic actions, testing hipolygraph $\mathcal{P}(s)=\left(V_{1}, V_{0}, E, C\right)$ becomes a regular polygraph $P(s)=\left(V_{0}, E, C\right)$. In this case, execution $s$ is both view correct and view consistent.

### 7.2 B Correctness

We define the directed higraph $\mathcal{H}(s)$ to test whether execution $s$ is B correct. Given an execution $s$, higraph $\mathcal{H}(s)=\left(V_{1}, V_{0}, E\right)$, where $V_{1}=\left\{\left(O_{1}^{1}, E_{1}^{1}\right),\left(O_{1}^{2}, E_{1}^{2}\right), \ldots,\left(O_{2}^{1}, E_{2}^{1}\right), \ldots,\right\}$, is constructed from hipolygraph


Figure 28: $\mathcal{P}\left(s_{12}\right)$


Figure 29: Execution $s_{13}$.
$\mathcal{P}(s)=\left(V_{1}^{\prime}, V_{0}, E^{\prime}, C\right)$, where $V_{1}^{\prime}=\left\{\left(O_{1}^{1}, E_{1}^{\prime 1}, C_{1}^{1}\right),\left(O_{1}^{2}, E_{1}^{\prime 2}, C_{1}^{2}\right), \ldots,\left(O_{2}^{1}, E_{2}^{\prime 1}, C_{2}^{1}\right), \ldots,\right\}$, as follows.

1. For each supernode, $E_{i}^{\prime j} \subseteq E_{i}^{j}$.
2. For each entity $x$ and operations $R_{i}^{k}(x), W_{i}^{l}(x)$ and $W_{i}^{m}(x)$ in atomic action $j$ in process $P_{i}$, if $R_{i}^{k}(x)$ returns the value written by $W_{i}^{l}(x)$,
(a) $\operatorname{arc}\left(R_{i}^{k}(x), W_{i}^{m}(x)\right)$ is added to $E_{i}^{j}$, if $W_{i}^{m}(x)$ is performed after $R_{i}^{k}(x)$,
(b) $\operatorname{arc}\left(W_{i}^{m}(x), W_{i}^{l}(x)\right)$ is added to $E_{i}^{j}$, if $W_{i}^{m}(x)$ is performed before $W_{i}^{l}(x)$.
3. $E^{\prime} \subseteq E$.
4. For each entity $x$ and operations $R_{i}^{j}(x), W_{k}^{l}(x)$ and $W_{c}^{a}(x)$ in $s$, such that $R_{i}^{j}(x)$ returns the value written by $W_{k}^{l}(x)$,
(a) arc $\left(\operatorname{parent}\left(R_{i}^{j}(x)\right)\right.$, $\left.\operatorname{parent}\left(W_{c}^{a}(x)\right)\right)$ is added to $E$, if $W_{c}^{a}(x)$ is performed after $R_{i}^{j}(x)$, and either $\operatorname{parent}\left(R_{i}^{j}(x)\right)=\operatorname{parent}\left(W_{k}^{l}(x)\right) \neq \operatorname{parent}\left(W_{c}^{a}(x)\right)$ or $\operatorname{parent}\left(R_{i}^{j}(x)\right) \neq \operatorname{parent}\left(W_{c}^{a}(x)\right) \neq$ $\operatorname{parent}\left(W_{k}^{l}(x)\right) \neq \operatorname{parent}\left(R_{i}^{j}(x)\right)$.
(b) $\operatorname{arc}\left(\operatorname{parent}\left(W_{c}^{a}(x)\right)\right.$, $\left.\operatorname{parent}\left(W_{k}^{l}(x)\right)\right)$ is added to $E$, if $W_{k}^{l}(x)$ is performed after $W_{c}^{a}(x)$, and either $\operatorname{parent}\left(R_{i}^{j}(x)\right)=\operatorname{parent}\left(W_{k}^{l}(x)\right) \neq \operatorname{parent}\left(W_{c}^{a}(x)\right)$ or $\operatorname{parent}\left(R_{i}^{j}(x)\right) \neq \operatorname{parent}\left(W_{c}^{a}(x)\right) \neq$ $\operatorname{parent}\left(W_{k}^{l}(x)\right) \neq \operatorname{parent}\left(R_{i}^{j}(x)\right)$.

Theorem 7.3 An execution $s$ is $B$ correct, if and only if $\mathcal{H}(s)$ is acyclic.
Theorem 7.4 We can test whether an execution is $B$ correct in $O\left(n^{2}\right)$ time, where $n$ is the total number operations in all processes.

In Figure 30, we show the higraph $\mathcal{H}\left(s_{13}\right)$ corresponding to execution $s_{13}$ in Figure 29. The higraph is cyclic, and therefore the execution is not B correct. Figure 31 illustrates execution $s_{14}$, which is B correct. Note that if an execution $s=<C, P B>$ does not contain any atomic actions, testing higraph $\mathcal{H}(s)=\left(V_{1}, V_{0}, E\right)$ becomes a regular graph $H(s)=\left(V_{0}, E\right)$. In this case, execution $s$ is both B correct and B consistent.

### 7.3 Conflict Correctness

We define the directed higraph $\mathcal{G}(s)=\left(V_{1}, V_{0}, E\right)$ to test whether $s$ is conflict correct. The set of vertices $V_{0}$ is the set of all operations of all processes in $s$, which are not in an atomic action. Hence, $V_{0}=$ $O-\bigcup_{S \in A A} S$. Each supernode corresponds to an atomic action, hence, the set of supernodes is $V_{1}=$ $\left\{\left(O_{1}^{1}, E_{1}^{1}\right),\left(O_{1}^{2}, E_{1}^{2}\right), \ldots,\left(O_{2}^{1}, E_{2}^{1}\right), \ldots\right\}$. There are four types of directed edges in $E_{i}^{j}$.

1. For each pair of operations $a$ and $b$ in atomic action $j$ in process $P_{i}$, if $a$ is immediately before $b$ in the program order, then the $\operatorname{arc}(a, b)$ is added to $E_{i}^{j}$.


Figure 30: $\mathcal{H}\left(s_{13}\right)$


Figure 31: Execution $s_{14}$.
2. For each entity $x$ and pair of operations $R_{i}^{j}(x)$ and $W_{i}^{l}(x)$ in atomic action $j$ of process $P_{i}$, if $R_{i}^{j}(x)$ returns the value written by $W_{i}^{l}(x)$, then edge $\left(W_{i}^{l}(x), R_{i}^{j}(x)\right)$ is added to $E_{i}^{j}$.
3. For each entity $x$ and pair of operations $R_{i}^{j}(x)$ and $W_{i}^{l}(x)$ in atomic action $j$ of process $P_{i}$, if $W_{i}^{l}(x)$ overwrites the value read by $R_{i}^{j}(x)$, then edge $\left(R_{i}^{j}(x), W_{k}^{l}(x)\right)$ is added to $E_{i}^{j}$.
4. For each entity $x$ and pair of operations $W_{i}^{j}(x)$ and $W_{i}^{l}(x)$, in atomic action $j$ of process $P_{i}$, if $W_{i}^{l}(x)$ overwrites the value written by $W_{i}^{j}(x)$, then edge $\left(W_{i}^{j}(x), W_{k}^{l}(x)\right)$ is added to $E_{i}^{j}$.

There are four types of arcs in $E$ :

1. For each pair of operations $a$ and $b$, if parent $(a) \neq \operatorname{parent}(b)$, and $a$ is before $b$ in program order $I B$, and there is no other operation $c$ such that parent $(a) \neq \operatorname{parent}(c) \neq \operatorname{parent}(b), a$ is before $c$ and $c$ is before $b$ in $I B$, then the $\operatorname{arc}(\operatorname{parent}(a)$, parent $(b))$ is added to $E$.
2. For each entity $x$ and pair of operations $R_{i}^{j}(x)$ and $W_{k}^{l}(x)$, if $R_{i}^{j}(x)$ returns the value written by $W_{k}^{l}(x)$, and $\operatorname{parent}\left(R_{i}^{j}(x)\right) \neq \operatorname{parent}\left(W_{k}^{l}(x)\right)$, then edge $\left(\operatorname{parent}\left(W_{k}^{l}(x)\right), \operatorname{parent}\left(R_{i}^{j}(x)\right)\right)$ is added to $E$.
3. For each entity $x$ and pair of operations $R_{i}^{j}(x)$ and $W_{k}^{l}(x)$, if $W_{k}^{l}(x)$ overwrites the value read by $R_{i}^{j}(x)$, and $\operatorname{parent}\left(R_{i}^{j}(x)\right) \neq \operatorname{parent}\left(W_{k}^{l}(x)\right)$, then edge $\left(\operatorname{parent}\left(R_{i}^{j}(x)\right)\right.$, $\left.\operatorname{parent}\left(W_{k}^{l}(x)\right)\right)$ is added to $E$.
4. For each entity $x$ and pair of operations $W_{i}^{j}(x)$ and $W_{k}^{l}(x)$, if $W_{k}^{l}(x)$ overwrites the value written by $W_{i}^{j}(x)$, and $\operatorname{parent}\left(W_{i}^{j}(x)\right) \neq \operatorname{parent}\left(W_{k}^{l}(x)\right)$, then edge $\left(\operatorname{parent}\left(W_{i}^{j}(x)\right), \operatorname{parent}\left(W_{k}^{l}(x)\right)\right)$ is added to $E$.

Theorem 7.5 An execution $s$ is conflict correct if and only if $\mathcal{G}(s)$ is acyclic.
In Figure 32, we show the higraph $\mathcal{G}\left(s_{14}\right)$ corresponding to execution $s_{14}$ in Figure 31. The higraph is cyclic, and therefore, the execution is not conflict correct. The problem of deciding whether an execution is conflict correct can be solved in polynomial time. Note that if an execution $s=<C, P B>$ does not contain any atomic actions, testing higraph $\mathcal{G}(s)=\left(V_{1}, V_{0}, E\right)$ becomes a regular graph $G(s)=\left(V_{0}, E\right)$. In this case, execution $s$ is both conflict correct and conflict consistent.

## 8 Conclusions

An ideal system is one that performs program operations in the order specified by the program and executes atomic program segments exclusively. Although this system model simplifies the task of reasoning about both sequential and concurrent programs, its straightforward implementation yields poor performance. To enhance performance, concurrency and pipelining techniques can be used, which may result in data accesses that are performed in an order which is different from the order specified by the program, which may result in incorrect executions. An execution is correct if its result is equivalent to the result that could have been obtained had the execution taken place on the ideal system. In this paper, we have developed a unified general theory of


Figure 32: $\mathcal{G}\left(s_{14}\right)$
correct executions where the access orders differ from the access order on the ideal system. Our unifying theory is applicable to a variety of programming paradigms, application domains, and architectures. It provides a verification tool to test the correctness of an execution, and allows us to devise more efficient protocols for various systems.

## References

[1] L. Lamport, "How to make a multiprocessor computer that correctly executes multiprocess programs," IEEE Transactions on Computers, September 1979, pp. 690-691.
[2] H. F. Korth, and A. Silberschatz, Database System Concepts, McGraw-Hill, 1991.
[3] A. Silberschatz, J. Peterson, and P. Galvin, Operating System Concepts, Addison-Wesley Publishing Company, Inc., 1991.
[4] G. R. Andrews, and F. B. Schneider, "Concepts and Notations for Concurrent Programming," ACM Computing Surveys, June 1983, pp. 3-69.
[5] G. S. Almasi, and A. Gottlieb, Highly Parallel Computing, The Benjamin/Cummings Publishing Company, 1989.
[6] D. Shasha, and M. Snir, "Efficient and correct execution of parallel programs that share memory," ACM Transactions on Programming Languages and Systems, April 1988, pp. 282-312.
[7] C. Papadimitriou, The Theory of Database Concurrency Control, Computer Science Press, 1986.
[8] M. Dubois and C. Scheurich, "Memory access dependencies in shared-memory multiprocessors," IEEE Transactions on the Software Engineering, June 1990, pp. 660-673.
[9] K. Gharachorloo, D. Lenoski, J. Laudon, P. Gibbons, A. Gupta, and J. Hennessy, "Memory consistency and event ordering in scalable shared-memory multiprocessors," Computer Architecture News, June 1990, pp. 15-26.
[10] S. V. Adve, and M. D. Hill, "Weak Ordering- A New Definition," Proceedings of the 17 th Annual International Symposium on Computer Architecture, May 1990, pp. 2-14.
[11] P. Bitar, "The weakest memory-access order," Journal of Parallel and Distributed Computing, 1992, pp. 305-331.
[12] J. L. Hennessy, and D. A. Patterson, Computer Architecture: A Quantitative Approach, Morgan Kaufmann Publishers, 1990.
[13] W. C. Brantley, K. P. McAuliffe, and J. Weiss, "RP3 processor-memory element," Proceedings of the International Conference on Parallel Processing, 1985, pp. 782-789.

## A Appendix

## A. 1 Proofs of Theorems in Section 5

## Proof of Theorem 5.1:

$s$ is B correct.
$\equiv\{$ Definition 5.5$\}$
$s$ is B equivalent to an execution $s_{x}$ in $X$.
$\Rightarrow$ \{Definition 5.4 \}
$s$ is view equivalent to $s_{x}$.
$\Rightarrow$ \{Definition 5.2$\}$
$s$ is view correct.

## Proof of Theorem 5.2:

$s$ is conflict correct.
$\equiv\{$ Definition 5.9$\}$
$s$ is conflict equivalent to an execution $s_{x}$ in $X$.
$\Rightarrow$ \{Definition 5.8 \}
$s$ is B equivalent to $s_{x}$.
$\Rightarrow$ \{Definition 5.5$\}$
$s$ is B correct.

## A. 2 Proofs of Theorems in Section 6

The following axioms directly follow from the definitions of higraphs and their underlying graphs Let $G=$ ( $V, E^{\prime}$ ) be the underlying graph of higraph $\mathcal{G}=\left(V_{1}, V_{0}, E\right)$, and $n_{1}, n_{2}$ and $n_{3}$ three nodes in $G$, such that $\operatorname{parent}\left(n_{1}\right) \neq \operatorname{parent}\left(n_{2}\right) \neq \operatorname{parent}\left(n_{3}\right) \neq \operatorname{parent}\left(n_{1}\right)$ in $\mathcal{G}$.

Axiom A. 1 There is an edge $\left(n_{1}, n_{2}\right)$ in $G$, if and only if there is an edge (parent $\left(n_{1}\right)$, parent $\left.\left(n_{2}\right)\right)$ in $\mathcal{G}$.
Axiom A. 2 There is a path between $n_{1}$ and $n_{2}$ in $G$, if and only if there is a path between parent $\left(n_{1}\right)$ and parent $\left(n_{2}\right)$ in $\mathcal{G}$.

Axiom A. 3 There is a cycle through $n_{1}$ and $n_{2}$ in $G$, if and only if there is a cycle through parent $\left(n_{1}\right)$ and parent $\left(n_{2}\right)$ in $\mathcal{G}$.

Axiom A. 4 If there is a cyclic supernode in $\mathcal{G}$, then $G$ is cyclic.
Axiom A. 5 If there is a supernode in $\mathcal{G}$ which is not totally ordered, then $G$ is not totally ordered.

## Proof of Theorem 6.1:

For the only if direction:
$\mathcal{G}$ is acyclic.
$\equiv\{$ Definition 6.5$\}$
$G$ is acyclic.
$\Rightarrow$ \{Axioms A. 3 and A. 4$\}$
$G^{\prime}$ is acyclic, and all supernodes are acyclic.

For the other direction:
Each supernode is acyclic, and $G^{\prime}$ is acyclic.
$\Rightarrow$ \{ Axiom A. 3 \}
If $G$ has a cycle, it must be only through the nodes of which parent is the same in $\mathcal{G}$.
$\Rightarrow\{$ Premise $\}$
$G$ is acyclic.
$\equiv\{$ Definition 6.5$\}$
$\mathcal{G}$ is acyclic.

## A. 3 Proofs of Theorems in Section 7

We introduce the following lemmas to simplify the proofs of theorems.
Lemma A. 1 If an execution $s$ is in $X$, then $\mathcal{P}(s)$ is acyclic.
Proof: Suppose $s$ is in $X$. For $s$, we know that $P B \supseteq I B$ and $P B$ is compatible with $I B$, and therefore the graph representation of $P B$ is acyclic. We can build an acylic graph $G$ as follows. The nodes of $G$ are the nodes in $O$ and two nodes $P_{b}$ and $P_{f}$. $G$ contains the graph representation of $P B$ and has two additional directed edges $\left(P_{b}, a\right)$ and $\left(a, P_{f}\right)$ for each node $a$ in $O$. Furthermore, if $s n_{i}$ and $s n_{j}$ are elements of $O / A$, such that $s n_{i} P B / A s n_{j}$, then for all nodes $a$ and $b$ such that $a \in s n_{i}$ and $b \in s n_{j}$, edge $(a, b)$ is added to $G$. These edges do not generate cycles in $G$, since $P B / A \supseteq I B / A$ and $P B / A$ is compatible with $I B / A$, and therefore the graph representation of $P B / A$ is acyclic for $s . G$ is the underlying graph of a higraph $\mathcal{G}=\left(V_{1}, V_{0}, E\right)$, such that $V_{1}=A A$ and $V_{0}=\left\{P_{b}, P_{f}\right\} \cup O-\bigcup_{S \in A A} S$. Then, $E$ is the set $G / A$. $\mathcal{G}$ is acyclic. We claim now that $\mathcal{G}$ is compatible with hipolygraph $\mathcal{P}(s)$ : Any arc in $\mathcal{P}(s)$ is certainly an arc in $\mathcal{G}$, and any arc in a supernode in $\mathcal{P}(s)$ is an arc in the same supernode in in $\mathcal{G}$, and for any choice $(a, b, c)$ in $\mathcal{P}(s)$, either $\operatorname{arc}(a, b)$ or $(b, c)$ is in $\mathcal{G}$, and for any choice $(a, b, c)$ in a supernode in $\mathcal{P}(s)$, either $\operatorname{arc}(a, b)$ or $(b, c)$ is in the same supernode in $\mathcal{G}$. $\mathcal{P}(s)$ is acyclic, since there is a compatible higraph $\mathcal{G}$ that is acyclic.

Lemma A. 2 If $s$ is view correct, then for any execution $s_{x}$ in $X, \mathcal{P}(s)=\mathcal{P}\left(s_{x}\right)$.

Proof: Suppose that $s=<C, P B>$ is view correct. This means that $s$ is view equivalent to an execution $s_{x}=<C, P B_{c}>$ in $X$. Due to the definition of $X$ (Definition 3.1), $s_{x}$ has the same $C$ as $s$. Hence, both executions $s$ and $s_{x}$ correspond to the same operations, program order, and atomic actions. Since $s$ and $s_{x}$ are view equivalent, in both executions, the read operations return the value written by the same write operations, initial values are returned by the same read operations, and final values are written by the same write operations (Definition 5.1). Thus, if all the steps for the construction of polygraphs $\mathcal{P}(s)$ and $\mathcal{P}\left(s_{x}\right)$ are followed, the same supernodes, nodes,edges and choices will be generated for both hipolygraphs.

Lemma A. 3 A higraph $\mathcal{G}=\left(V_{1}, V_{0}, E\right)$ is a total order, if and only if each supernode in $V_{1}$ is a total order, and graph $G^{\prime}=\left(V_{1} \cup V_{0}, E\right)$ is a total order.

## Proof:

For the only if direction:
$\mathcal{G}$ is a total order.
$\equiv\{$ Definition 6.5$\}$
$G$ is a total order.
$\Rightarrow\{$ Axioms A. 2 and A. 5$\}$
$G^{\prime}$ is a total order, and all supernodes are a total order.

For the other direction:
Each supernode is a total order, and $G^{\prime}$ is a total order.
$\Rightarrow\{$ Axiom A. 2 \}
If $G$ is not totally ordered a cycle, it must be only because there is no edge among some pairs of nodes of which parent is the same in $\mathcal{G}$.
$\Rightarrow$ \{Premise $\}$
$G$ is a total order.
$\equiv\{$ Definition 6.5$\}$
$\mathcal{G}$ is a total order.

## Proof of Theorem 7.1:

For the only if direction:
$s=\langle C, P B\rangle$ is view correct.
$\Rightarrow\{$ Lemma A. 2$\}$
$\mathcal{P}(s)=\mathcal{P}\left(s_{x}\right)$.
$\Rightarrow$ \{ Lemma A. 1 \}
$\mathcal{P}(s)$ is acyclic.

For the other direction:
Suppose that $\mathcal{P}(s)$ is acyclic. Then, there is an acyclic directed higraph $\mathcal{G}$ which is compatible with $\mathcal{P}(s)$ (Definition 6.5). Higraph $\mathcal{G}$ can be completed to a total order $\mathcal{G}^{\prime}$ in which $P_{b}$ precedes all other operations and atomic actions in $s$, and $P_{f}$ follows all other operations and atomic actions in $s$. Let $G^{\prime}$ be the underlying graph of $\mathcal{G}^{\prime}$, and $G^{\prime \prime}$ be the subgraph of $G^{\prime}$ which excludes the nodes corresponding to $P_{b}$ and $P_{f}$. Obviously, execution $s_{x}=<C, G^{\prime \prime}>$ is in $X$. Now we will proof that $s_{x}$ is view equivalent to $s$.

1. Suppose that $R_{a}^{i}(x)$ reads the initial value of entity $x$ in $s$. Due to definition of $\mathcal{P}(s)$, for any write operation $W_{c}^{k}(x)$, there will be an edge $e=\left(R_{a}^{i}(x), W_{c}^{k}(x)\right)$ in $G^{\prime \prime}$. Since $G^{\prime \prime}$ is acyclic and is the performing order of $s_{x}, R_{a}^{i}(x)$ reads the initial value of $x$ in $s$, if and only if $R_{a}^{i}(x)$ reads the initial value of $x$ in $s_{x}$.
2. Suppose that $W_{a}^{i}(x)$ writes the final value of entity $x$ in $s$. Due to definition of $\mathcal{P}(s)$, for any other write operation $W_{c}^{k}(x)$, there will be an edge $e=\left(W_{c}^{k}(x), W_{a}^{i}(x)\right)$ in $G^{\prime \prime}$. Since $G^{\prime \prime}$ is acyclic and the performing order of $s_{x}, W_{a}^{i}(x)$ writes the final value of $x$ in $s$, if and only if $W_{a}^{i}(x)$ writes the final value of $x$ in $s_{x}$.
3. Suppose that $R_{d}^{j}(x)$ returns the value written by $W_{a}^{i}(x)$ in $s$. Due to the definition of $\mathcal{P}(s)$, edge $e=$ $\left(W_{c}^{k}(x), R_{d}^{j}(x)\right)$ in $G^{\prime}$. Furthermore, we claim that if $G^{\prime}$ is acyclic, there is no node $W_{c}^{k}(x)$, such that there is an edge $\epsilon_{1}=\left(W_{a}^{i}(x), W_{c}^{k}(x)\right)$ and an edge $e_{2}=\left(W_{c}^{k}(x), R_{d}^{j}(x)\right)$ in $G^{\prime}$. Suppose $G^{\prime}$ is acyclic and has the edges $\epsilon_{1}$ and $e_{2}$. Since $G^{\prime}$ is the underlying graph of higraph $\mathcal{G}^{\prime}$ that is compatible with $\mathcal{P}(s), G^{\prime}$ must contain either the edge $\left(W_{c}^{k}(x), W_{a}^{i}(x)\right)$ or the edge $\left(R_{d}^{j}(x), W_{c}^{k}(x)\right)$. Then, there is a cycle in $G^{\prime}$. Since $G^{\prime}$ is acyclic, $G^{\prime}$ cannot contain $\epsilon_{1}$ and $\epsilon_{2}$. Since $G^{\prime}$ is the performing order of $s_{x}, R_{d}^{j}(x)$ returns the value written by $W_{a}^{i}(x)$ in $s$, if and only if $R_{d}^{j}(x)$ returns the value written by $W_{a}^{i}(x)$ in $s_{x}$.
4. Since $s$ and $s_{x}$ consist of the same processes, write operations are dependent on the same entities. We assumed an architecture where a write operation is performed after the read operations on which the write operation is dependent. Hence, for each entity $x$ and $W_{i}^{j}(x)$ that is dependent on entity $y, R_{i}^{k}(y)$ is the first read performed on $y$ before $W_{i}^{j}(x)$ in execution $s$, if and only if $R_{i}^{k}(y)$ is the first read performed on $y$ before $W_{i}^{j}(x)$ in execution $s_{x}$.

Hence, $s$ is view equivalent to $s_{x}$, and therefore view correct.
Lemma A. 4 A hipolygraph $\mathcal{P}=\left(V_{1}, V_{0}, E, C\right)$ is acyclic, if and only if each supernode snp $=\left(V_{1 i}, E_{1 i}, C_{1 i}\right)$ in $V_{1}$ is acyclic, and polygraph $P^{\prime}=\left(V_{1} \cup V_{0}, E, C\right)$ is acyclic.

## Proof:

Polygraph $P^{\prime}=\left(V_{1} \cup V_{0}, E, C\right)$ is acyclic, and each supernode in $V_{1}$ is acyclic.
$\equiv$
There is an acylic graph $G^{\prime}=\left(V_{1} \cup V_{0}, E^{\prime}\right)$ that is compatible with polygraph $P^{\prime}$, and for each supernode in $V_{1}$, there is an acylic graph $s n_{i}=\left(V_{1 i}, E_{1 i}^{\prime}\right)$ that is compatible with polygraph $s n p_{i}$.
$\equiv\{$ Theorem 6.1$\}$
Higraph $\mathcal{G}=\left(V_{1}, V_{0}, E^{\prime}\right)$ is acyclic.
$\equiv$
Hipolygraph $\mathcal{P}=\left(V_{1}, V_{0}, E, C\right)$ is acyclic.
Theorem A. 1 The problem of deciding whether an execution is view consistent is NP-complete.

The proof of this theorem relies on the fact that determining whether a polygraph is acyclic, which is an NP-complete problem [7].

## Proof of Theorem 7.2:

It follows from Theorem A. 1 and Lemma A. 4 that deciding whether a hipolygraph is acyclic is an NP-complete problem. Hence, the problem of deciding whether an execution is view correct is NP-complete.

Lemma A.5 If an execution $s$ is in $X$, then $\mathcal{H}(s)$ is acyclic.
Proof: Higraph $\mathcal{G}$ which is constructed in the proof of Lemma A. 1 contains $\mathcal{H}(s)$. Thus, $\mathcal{H}(s)$ is acyclic.
Lemma A. 6 If $s$ is $B$ correct, then for any execution $s_{x}$ in $X, \mathcal{H}(s)=\mathcal{H}\left(s_{x}\right)$.
Proof: Suppose that $s=<C, P B>$ is B correct. This means that $s$ is B equivalent to an execution $s_{x}=<C, P B_{c}>$ in $X$. Due to the definition of $X$ (Definition 3.1), $s_{x}$ has the same $C$ as $s$. Hence, both executions $s$ and $s_{x}$ correspond to the same operations, program order, and atomic actions. Since $s$ and $s_{x}$ are B equivalent, in both executions, the read operations return the value written by the same write operations, initial values are returned by the same read operations, final values are written by the same write operations, a write operation is performed before another write operation, if the value written by the second write operation is returned by a read operation, and if a write operation is performed after a read operation in $s$, this write operation is performed after the read operation in $s_{x}$ (Definition 5.4). Thus, if all the steps for the construction of higraphs $\mathcal{H}(s)$ and $\mathcal{H}\left(s_{x}\right)$ are followed, the same supernodes, nodes and edges will be generated for both higraphs.

## Proof of Theorem 7.3:

For the only if direction:
$s=\langle C, P B\rangle$ is B correct.
$\Rightarrow$ \{ Lemma A. 6 \}
$\mathcal{H}(s)=\mathcal{H}\left(s_{x}\right)$.
$\Rightarrow\{$ Lemma A. 5$\}$
$\mathcal{H}(s)$ is acyclic.

For the other direction:
Suppose that $\mathcal{H}(s)$ is acyclic. Higraph $\mathcal{H}(s)$ can be completed to a total order $\mathcal{G}$ in which $P_{b}$ precedes all other operations and atomic actions in $s$, and $P_{f}$ follows all other operations and atomic actions in $s$. Let $G$ be the underlying graph of $\mathcal{G}$, and $G^{\prime}$ be the subgraph of $G$ which excludes the nodes corresponding to $P_{b}$ and $P_{f}$. Obviously, execution $s_{x}=<C, G^{\prime}>$ is in $X$. Similar to proof of Theorem 7.1, one can go through each item in the definition of B equivalence (Definition 5.4), and prove that $s_{x}$ is B equivalent to $s$, hence $s$ is B correct.

## Proof of Theorem 7.4:

The problem of deciding whether a graph is acyclic can be performed in $O\left(k^{2}\right)$ time, where $k$ is the number of nodes in the graph. Thus, it follows from the Theorem 6.1 that we can test whether an execution is B correct in $O\left(n^{2}\right)$ time, where $n$ is the total number operations in all processes.

Lemma A. 7 If an execution $s$ is in $X$, then $\mathcal{G}(s)$ is acyclic.
Proof: Suppose $s$ is in $X$. For $s$, we know that $P B \supseteq I B$ and $P B$ is compatible with $I B$, and therefore the graph representation of $P B$ is acyclic. We can build an acylic graph $G$ as follows. The nodes of $G$ are the nodes in $O . G$ contains the graph representation of $P B$ and furthermore, if $s n_{i}$ and $s n_{j}$ are elements of $O / A$, such that $s n_{i} P B / A s n_{j}$, then for all nodes $a$ and $b$ such that $a \in s n_{i}$ and $b \in s n_{j}$, edge $(a, b)$ is added to $G$. These edges do not generate cycles in $G$, since $P B / A \supseteq I B / A$ and $P B / A$ is compatible with $I B / A$, and therefore the graph representation of $P B / A$ is acyclic for $s . G$ is the underlying graph of a higraph $\mathcal{G}=\left(V_{1}, V_{0}, E\right)$, such that $V_{1}=A A$ and $V_{0}=O-\bigcup_{S \in A A} S$. Then, $E$ is the set $G / A . \mathcal{G}$ is acyclic. Obviously, $\mathcal{G}(s)=\mathcal{G}$.

Lemma A. 8 If $s$ is conflict correct, then for any execution $s_{x}$ in $X, \mathcal{G}(s)=\mathcal{G}\left(s_{x}\right)$.

Proof: Suppose that $s=<C, P B>$ is conflict correct. This means that $s$ is conflict equivalent to an execution $s_{x}=<C, P B_{c}>$ in $X$. Due to the definition of $X$ (Definition 3.1), $s_{x}$ has the same $C$ as $s$. Hence, both executions $s$ and $s_{x}$ correspond to the same operations, program order, and atomic actions. Since $s$ and $s_{x}$ are conflict equivalent, in both executions, the conflicting executions are performed in the same order (Definition 5.8). Thus, higraphs $\mathcal{G}(s)$ and $\mathcal{G}\left(s_{x}\right)$ are the same.

## Proof of Theorem 7.5:

For the only if direction:
$s=\langle C, P B\rangle$ is conflict correct.
$\Rightarrow\{$ Lemma A. 8$\}$
$\mathcal{G}(s)=\mathcal{G}\left(s_{x}\right)$.
$\Rightarrow$ \{ Lemma A. 7 \}
$\mathcal{G}(s)$ is acyclic.

For the other direction:
Suppose that $\mathcal{G}(s)$ is acyclic. Higraph $\mathcal{G}(s)$ can be completed to a total order $\mathcal{G}^{\prime}$. Let $G^{\prime}$ be the underlying graph of $\mathcal{G}^{\prime}$, Obviously, execution $s_{x}=<C, G^{\prime}>$ is in $X$. Similar to proof of Theorem 7.1, one can go through each item in the definition of conflict equivalence (Definition 5.8), and prove that $s_{x}$ is conflict equivalent to $s$, hence $s$ is conflict correct.


[^0]:    *This material is based in part upon work supported by the Texas Advanced Technology Program under Grant No. ATP-024, the National Science Foundation under Grant Nos. IRI-9003341 and IRI-9106450, and grants from the IBM and Hewlett-Packard corporations.

[^1]:    ${ }^{*} n \geq 1$. We allow that $n$ to be one to be able to apply the theory also to sequential programs.

