Lorenzo's Scheduling Algorithm

Example where the new algorithm beats the old one: graph, trace for the old algorithm, trace for the new algorithm

The graphs compare the amount of concurrency exhibited by the two scheduling algorithms. For these tests, the new scheduling algorithm was modified so that an action piggybacks a request on every token that it sends out. In addition, an action sends a request during every step for all the tokens that it doesn't have (if it hasn't already made the request). A note about the graphs: The old algorithm's plot is represented by the 'o' characters, and the new algorithm's plot is represented by the 'x' characters. The x-axis represents the number of steps, while the y-axis represents the number of actions running concurrently.

Below are 9 sets of PostScript files. Each file contains 4 graphs. Each graph measures one of 4 things: **Messages sent**, **concurrency**, **fairness**, and **step fairness**. **Messages sent** corresponds to the number of messages sent by the system. **Concurrency** corresponds to the total number of steps that were executed divided by the total number of possible steps that could have been executed if there were no incompatibilities. Let a0 denote the action which executed to completion the fewest number times. Let a1 denote the action which executed to completion the greatest number of times. **Fairness** is the ratio of the number of times a0 executed to completion to the number of times a1 executed to completion. **Step fairness** is like fairness except that it compares the action with fewest steps executed to the action with the most steps executed. For both of these fairness indicators, the greater the ratio the more fairness the algorithm exhibits.

7-25-98

d3p30.ps Duration: 1-3 steps per action, Chance of Incompatibility: 30%.

d3p65.ps Duration: 1-3 steps per action, Chance of Incompatibility: 65%.

d3p100.ps Duration: 1-3 steps per action, Chance of Incompatibility: 100%.

d8p30.ps Duration: 1-8 steps per action, Chance of Incompatibility: 30%.

d8p65.ps Duration: 1-8 steps per action, Chance of Incompatibility: 65%.

d8p100.ps Duration: 1-8 steps per action, Chance of Incompatibility: 100%.

d13p30.ps Duration: 1-13 steps per action, Chance of Incompatibility: 30%.

d13p65.ps Duration: 1-13 steps per action, Chance of Incompatibility: 65%.

d13p100.ps Duration: 1-13 steps per action, Chance of Incompatibility: 100%.

7-27-98

d3p10.ps Duration: 1-3 steps per action, Chance of Incompatibility: 10%.

d8p10.ps Duration: 1-8 steps per action, Chance of Incompatibility: 10%.

d13p10.ps Duration: 1-13 steps per action, Chance of Incompatibility: 10%.

7-27-98 (Number of Actions: 3-43, Number of Steps: 500)

d5-15p10.ps Duration: 5-15 steps per action, Chance of Incompatibility: 10%.

d5-15p30.ps Duration: 5-15 steps per action, Chance of Incompatibility: 30%.

d15-25p10.ps Duration: 15-25 steps per action, Chance of Incompatibility: 10%.

d15-25p30.ps Duration: 15-25 steps per action, Chance of Incompatibility: 30%.

d25-35p10.ps Duration: 25-35 steps per action, Chance of Incompatibility: 10%.

d25-35p30.ps Duration: 25-35 steps per action, Chance of Incompatibility: 30%.

7-27-98, Updated on 7-31-98 (Sequential Acquisition of Tokens, Number of Actions: 3-43, Number of Steps: 500)

d5-15p10.ps Duration: 5-15 steps per action, Chance of Incompatibility: 10%.

d5-15p30.ps Duration: 5-15 steps per action, Chance of Incompatibility: 30%.

d15-25p10.ps Duration: 15-25 steps per action, Chance of Incompatibility: 10%.

d15-25p30.ps Duration: 15-25 steps per action, Chance of Incompatibility: 30%.

d25-35p10.ps Duration: 25-35 steps per action, Chance of Incompatibility: 10%.

d25-35p30.ps Duration: 25-35 steps per action, Chance of Incompatibility: 30%.

7-29-98, Updated on 8-1-98 (Number of Actions: 3-53, Number of Steps: 10000)

trace file

d5-15p10.ps Duration: 5-15 steps per action, Chance of Incompatibility: 10%.

d5-15p30.ps Duration: 5-15 steps per action, Chance of Incompatibility: 30%.

d15-25p10.ps Duration: 15-25 steps per action, Chance of Incompatibility: 10%.

d15-25p30.ps Duration: 15-25 steps per action, Chance of Incompatibility: 30%.

d25-35p10.ps Duration: 25-35 steps per action, Chance of Incompatibility: 10%.

d25-35p30.ps Duration: 25-35 steps per action, Chance of Incompatibility: 30%.

7-30-98

(Number of Actions: 3-43, Duration: 5-15 steps per action, Chance of Incompatibility: 30%, Number of Steps: 10,000)

trace file,
graph

(Number of Actions: 3-43, Duration: 5-15 steps per action, Chance of Incompatibility: 30%, Number of Steps: 100,000)

trace file,
graph

7-30-98

Example of rapid decline in fairness as demonstrated in 3 simulations.

In the first simulation, we have the following configuration:

action 0 takes 1 units of time to execute

action 1 takes 1 units of time to execute

action 2 takes 1 units of time to execute

action 3 takes 1 units of time to execute

action 0 and action 2 share token 0

action 1 and action 2 share token 1

action 1 and action 3 share token 2

action 2 and action 3 share token 3

In the second simulation, we add the following to the first configuration:

action 4 takes 1 units of time to execute

And have the following token assignments:

action 0 and action 2 share token 0

action 1 and action 2 share token 1

action 1 and action 3 share token 2

action 1 and action 4 share token 3

action 2 and action 3 share token 4

action 3 and action 4 share token 5

In the third simulation, we add the following to the second configuration:

action 5 takes 1 units of time to execute

And have the following token assignments:

action 0 and action 2 share token 0

action 1 and action 2 share token 1

action 1 and action 3 share token 2

action 1 and action 4 share token 3

action 2 and action 3 share token 4

action 3 and action 4 share token 5

action 3 and action 5 share token 6

action 4 and action 5 share token 7

The trace files show that the first configuration takes 8 steps to execute every action at least once, the second configuration takes 13 steps to execute every action at least once, and the third configuration takes 20 steps to execute every action at least once. This increase in the number of steps continues as each additional action is added to the system according to the pattern described above. After running a few more trials, I've discovered that for each action added, the number of steps increases by at least 50%. Hence 10,000 steps can be reached in under 20 actions.
Below are the trace files.

trace file 1,
trace file 2,
trace file 3

8-2-98 (Number of Actions: 3-53, Number of Steps: 2000)

Lorenzo's algorithm was modified so that it would exhibit more "fairness." In the original version of Lorenzo's algorithm, a non-executing process that receives a token request would only give away the token if it is lacking one of its shared tokens whose ID is smaller than the one requested. I've relaxed this constraint a bit so that even if the above condition doesn't hold, the action receiving the request would give away the token if it has executed more times than the requestor (and a request would be piggybacked on this token). Tests show that the modified version of Lorenzo's algorithm provides more concurrency and fairness than the original version. The downside is that the modified version of Lorenzo's algorithm sometimes sends more messages than Jay's algorithm.

d5-15p10.ps Duration: 5-15 steps per action, Chance of Incompatibility: 10%, Lorenzo's Original Algorithm.

d5-15p10.ps Duration: 5-15 steps per action, Chance of Incompatibility: 10%, Modified Lorenzo's Algorithm.

d5-15p30.ps Duration: 5-15 steps per action, Chance of Incompatibility: 30%, Lorenzo's Original Algorithm.

d5-15p30.ps Duration: 5-15 steps per action, Chance of Incompatibility: 30%, Modified Lorenzo's Algorithm.

d15-25p10.ps Duration: 15-25 steps per action, Chance of Incompatibility: 10%, Lorenzo's Original Algorithm.

d15-25p10.ps Duration: 15-25 steps per action, Chance of Incompatibility: 10%, Modified Lorenzo's Algorithm.

d15-25p30.ps Duration: 15-25 steps per action, Chance of Incompatibility: 30%, Lorenzo's Original Algorithm.

d15-25p30.ps Duration: 15-25 steps per action, Chance of Incompatibility: 30%, Modified Lorenzo's Algorithm.

8-7-98 (Number of Actions: 10-16)

The following graphs illustrate the amount of concurrency that can theoretically be gained by using token swapping. The trend is that the gain increases as 1) the number of actions in the system increases; and 2) the range of action durations increases.

d10-20p5.ps Duration: 10-20 steps per action, Chance of Incompatibility: 5%, Number of Steps: 2000.

d10-90p5.ps Duration: 10-90 steps per action, Chance of Incompatibility: 5%, Number of Steps: 2000.

d10-300p5.ps Duration: 10-300 steps per action, Chance of Incompatibility: 5%, Number of Steps: 4000.

It takes about 30 ms to pass a token using sockets. This was calculated by making the following modifications to the scheduler: 1) right before a token is sent out, the current time is recorded; 2) once a token is received, a reply message is sent back to the token's sender; 3) once a reply message is received, the current time is recorded. Hence the round-trip time for a token is calculated by subtracting the first recorded time from the second recorded time. The time it takes to pass a token is just half of that round-trip time. The two actions sharing the token reside on separate machines on the same LAN.

It has recently been discovered that the token passing time can be reduced to 5-6 ms by using RMI rather than sockets. [The scheduler has not yet been converted to using RMI to pass tokens and is currently still using sockets.]

A matrix multiplication program has been implemented in Seuss. It multiplies an mXn matrix A with an nXm matrix B to get a matrix C. The variable m is the number of actions in the system, while the variable n is an arbitrary whole number. Each action in the system has a copy of matrix B and is assigned a row in matrix A. Every action multiplies its assigned row with matrix B and submits the result to a special action. After the special action receives the results from all of the other actions, it displays the resulting matrix C.

8-24-98

perf.psGraph illustrating the performance of Jay's algorithm when running the matrix multiplication program.

The matrix multiplication program consists of 4 actions (one of them is the special action which doesn't perform any useful computation). All of the actions are compatible with each other.

The graph reveals a couple of anomalies. The first anomaly is that it takes longer to multiply matrices of size 100,000 X 3 than 150,000 X 3 on a single processor. The second is that there seems to be no speedup when 2 processors are used rather than 1. I believe that these anomalies occur because of the way Java handles RMI calls. RMI calls in Java are handled by a thread created by the Java run-time. This thread currently has a higher priority than the threads that execute the actions. When I increased the priorities of the threads executing the actions so that the RMI thread has equal or lower priority, the time it took to multiply matrices of size 100,000 X 3 decreased, but the time it took to multiply matrixes of size 150,000 X 3 was greater than that for matrices of size 200,000 X 3. Hence the anomaly was still present, though in a slightly different form. I believe that such anomalies are unnavoidable if we wish to use Java RMI, but that they can be minimized if the computations performed by each action required more time. As a side note, the applications that were tested on the Orca system typically required 10 to 30 seconds to execute [ACM Transactions on Computer Systems, February 1998].

8-25-98

Three matrix multiplication problems were tested on a single-processor machine. It took about 1.5 seconds to multiply matrices of size 100,000 X 3, 5.4 seconds to multiply matrices of size 150,000 X 3, and 2.8 seconds to multiply matrices of size 200,000 X 3. One would expect that it should take longer to multiply matrices of size 200,000 X 3 than 150,000 X 3, but that wasn't the case. Upon examination of the execution profiles (generated by running Java with the "-prof" option), the execution which multiplied matrices of size 150,000 X 3 made 2 peculiar method calls that took up significant amounts of time. The reason these method calls are peculiar is that neither of these calls appeared in the 100,000 X 3 execution or the 200,000 X 3 execution. Here are the two calls as they appear in the profile for the 150,000 X 3 execution:

`
4 java/lang/System.gc()V java/io/ObjectOutputStream.resetStream()V 1826
3 java/lang/System.gc()V java/lang/StringBuffer.(init)(I)V 949
`

This means that the method java/io/ObjectOutputStream.resetStream() called the garbage collector 4 times, taking up a total of 1826 ms. The constructor of java/lang/StringBuffer called the garbage collector 3 times, taking up a total of 949 ms. None of the code in my scheduler directly calls java/io/ObjectOutputStream.resetStream() or directly instantiates java/lang/StringBuffer. I suspect that Java RMI is what calls java/io/ObjectOutputStream.resetStream(). I don't know which methods in the Java API instantiate objects of the java/lang/StringBuffer class.

8-28-98

The most time consuming (greater than 100 ms) entries in the 150,000 X 3 execution's profile are:

`
11 java/lang/System.gc()V util/Comp. ()V 2298
`

4 java/lang/System.gc()V admin/Action.

4 java/lang/System.gc()V java/io/ObjectOutputStream.resetStream()V 1826

3 java/lang/System.gc()V java/lang/StringBuffer.

1 java/net/PlainSocketImpl.socketAccept(Ljava/net/SocketImpl;)V java/net/PlainSocketImpl.accept(Ljava/net/SocketImpl;)V 3556

1 java/lang/Thread.sleep(J)V sun/rmi/transport/Reaper.run()V 3184

After adding the command "System.gc();" right before the timing code, the execution time for the 150,000 X 3 test went down from 5.4 seconds to 2.1 seconds. The most time consuming entries in the profile then became:

`
11 java/lang/System.gc()V util/Comp. ()V 2331
`

4 java/lang/System.gc()V admin/Action.

1 java/lang/Thread.sleep(J)V sun/rmi/transport/Reaper.run()V 3096

1 java/lang/System.gc()V java/lang/System.gc()V 882

1 java/net/PlainSocketImpl.socketAccept(Ljava/net/SocketImpl;)V java/net/PlainSocketImpl.accept(Ljava/net/SocketImpl;)V 3619

The matrix multiplication timing results with the "System.gc();" command right before the timing code are found here

9-12-98

The Fox et. al. matrix multiplication algorithm (Parallel Computing 1987) was tested on 30 X 30 matrices using 1-4 processors. The results are here. I increased the computation time required by repeatedly multiplying the matrix elements. Hence an iteration value of 100 means that the matrices were multiplied 100 times.