CS 372 Operating Systems

Lab 2 Programming Assignment:
Synchronization

Clarifications and corrections are in yellow.


Due: Friday October 2, 11:59pm


Lab Purpose

In this lab you will learn about multithreaded programming.  You will program with locks, monitors and transactions.

The Scenario

This being Texas, imagine a shooting range.  There are a fixed number of lanes at the range, lets say 16.  There are two people (threads) at the range.  One of them shoots a blue color, the other a red color.  Each thread chooses a random lane, call it L.  The thread checks if the color of L is white (meaning no one has used the lane).  If it is, the thread shoots in the lane.  If there is a race condition, then something happens in between a thread checking that the lane color is white and shooting.  If both the red and the blue Rogue shoot in the same lane, the lane color will turn purple, indicating a race condition.  Race conditions are bad, and your task is to add enough synchronization to eliminate race conditions.  After a thread shoots, it cleans the lanes back to their original state, but ONLY if all of the lanes are colored, and all lanes must be cleaned. Do not change the color of purple lanes when they are cleaned, so we can see the mistakes.  We call the threads Rogues, because who else belongs in a shooting gallery than a Rogue?

Here is what it looks like.

Here is tour of the controls

Major Classes

Classes we provide

GalleryRogue. This is the class you will extend to write your various Rogues. GalleryRogue provides a getLane(int) method and a nlanes variable to access the GalleryLanes in the shooting gallery. rand is an instance of java.util.Random from which you may pull random numbers

GalleryLane. This is the main container for Lane-related activity, and the interface you will use to access the shooting gallery.  You are coordinating actions across different lanes, and also the data within a single lane.  The possible colors of the lane are declared as static final ints.  GalleryLane provides a getColor method to read the current color of the lane and shoot and clean methods to modify the color according to the above rules. The obj object is a programmer client convenience to store an object with the lane, and is accessed through the getObject and setObject methods.

Note the algorithms in GalleryLane.shoot and GalleryLane.clean.

GalleryLaneColor.  This is used by GalleryLane to hold the current color of the lane.

GallerySharedState.  This class holds data shared between all of the lanes in the Gallery, such as the number of shots fired (for red and blue), and the number of times lanes have been cleaned.

TMInt. This class contains an integer that can be safely modified within a transaction (i.e. conflicts are detected and value is rolled back on abort). Construct a new TMInt() and use the getValue() and setValue(x) to access the integer variable.

Classes you must write

RogueCoarse. This rogue should protect its action with a single, coarse-grained lock.

RogueFine. This rogue should protect its action with multiple, fine-grained locks. (This should run faster than RogueCoarse, but it is not a requirement.  Our version runs a bit faster, but not by much).

RogueTM. This rogue should protect its action with a transaction (syntax and usage explained below).

RogueCoarse2, RogueFine2, RogueTM2. These rogues should choose two random lanes that are NOT equal to each other.  It then checks both lanes and if both are white, it shoots both lanes. Important: your Rogue must color either both lanes or neither lane.

RogueCoarseCleaner, RogueFineCleaner, RogueTMCleaner. This rogue chooses a single random lane, just like the original scenario.  But now the cleaning operation is done by an independent thread, and is never done by a shooting thread.  A cleaning thread must clean all lanes or no lanes.  As the distributed code makes clear, threads either shoot or clean.  However they all must synchronize properly, which is your task.

Rules to live by

Rogues. For the single-lane Rogues, once a Rogue selects a lane it must determine the color of that lane. If the lane is white it must shoot. For two-lane shooters, once the Rogue selects two lanes it must determine their colors. If they are both white it must shoot. If one (or both) is colored it must not shoot. Do not use tryLock or isLocked to make decisions once you have chosen one or more lanes.

Cleaners. When the Rogue has a separate cleaning thread, use a condition variable to synchronize with the cleaner.  Most of the time, the cleaner has no work to do.

The distributed code uses explicit locking and condition variables via java.util.concurrent.locks.ReentrantLock and java.util.concurrent.locks.Condition.  Please use these interfaces rather than the synchronized keyword (and its associated notify and notifyAll).  Making the locking and conditional waiting explicit makes it more clear what is going on.

What is a coarse-grained lock and what is a fine-grained lock?  For the purposes of this assignment, a fine-grained lock only locks part of a data structure.  A coarse-grained lock locks an entire data structure.  A fine-grained lock allows more concurency.

Batch testing

We have provided a batch mode for the Gallery that has no GUI.  It is used for timing. The purpose of the batch testing is to let you run the following on a multiprocessor.  You need to compile and run your code as shown at the start of this document.  When you run with RogueCoarseCleanerBatch and RogueFineCleanerBatch as the values for rogue_class in the system.properties file, the program will output a line like this, indicating the time spent for each thread to do a fixed number of shots.
TIME: 27.52 seconds
Note that this will only work if you have implemented RogueCoarseCleaner and RogueFineCleaner.  The Batch version calls your implementation. If you have implemented RogueFineCleaner and RogueCoarseCleaner in the right way, then your RogueFineCleanerBatch should run faster than your RogueCoarseCleanerBatch.  Of course, if it does, that is not a guarantee that your code is correct.  And if it doesn't, your code might still be correct. 

However, you should implement RogueFineCleaner in such a way that two threads can concurrently shoot different lanes.  If you do that, the concurrency will decrease the latency of the batch computation, and your Fine will beat your Coarse in performance.  This tradeoff is far more delicate when the shooting threads are also doing the cleaning.  Again, we recommend that you use a condition variable to synchronize between the shooters and the cleaner.

Background on deadlock, and deadlock avoidance

Deadlock is a no-progress condition where some set of threads can't do any work because they are all stuck on locks (or some form of mutual exclusion).  This arises due to the confluence of 4 factors.

1. Mutual exclusion of resources
2. A process holds some resource and is waiting for another
3. Resources, once obtained, cannot be taken away
4. Circular waiting exists

The weak link in the chain is condition (4).  Circular wait doesn't arise if everyone grabs resources in the same order.  Remember the Chefs example?  One chef grabbed butter than salt, the other salt then butter.  If all chefs grabbed ingredients in alphabetical order, there would not be deadlock.  Grabbing locks in order is called lock ordering.

Order of implementation

We are interested in how long it takes you to implement the assignment, and we all know that people learn from doing related tasks.  So we are going to ask everyone to be in group 1 or group 2 (the next paragraph describes how you know what group you are in).  Both groups should do the coarse version first.  If you are in group 1, please do the Fine version second and TM version third.  If you are in group 2, please do the TM version second and the Fine version third.

If you are in group 1, you can do RogueCoarse, RogueFine, RogueTM, RogueCoase2, RogueFine2, RogueTM2, RogueCoarseCleaner, RogueFineCleaner, RogueTMCleaner, or you can do RogueCoarse, RogueCoarse2, RogueCoarseCleaner, RogueFine, RogueFine2, RogueFineCleaner, RogueTM, RogueTM2, RogueTMCleaner.  We don't care about the ordering across tasks, only within tasks.  We do suggest that you do Coarse first because its the simplest to think about.

What group am I in?  Here's how to tell: Take your last name (all lowercase), or if you are working with your partner, the concatenation of your names in alphabetical order. Use the md5sum command to create a hash of this string (e.g. `echo "jonessmith" | md5sum`). You will get a large hexadecimal number. If this number is even (ends in 0,2,4,6,8,a,c,e) you are in group 1. Otherwise, you are in group 2. Wasn't that fun?

Coding guidelines

We provide you a skeleton implementation of the Rogues you must write, including all of the necessary data structures. Please do not change the code outside of the four files you must write.  You may change the data structures, but we are telling you that what is there is sufficient to complete the lab correctly.

Running on multiprocessor machines

The CS department has multiprocessor linux machines.  These should be useful to help debug your code because they will expose more concurrency.  Running on these machines is not necessary for the code that we provide.  Our code is written with strategic calls to yield() that should create the important race conditions. It is possible for you to write race conditions that will be very unlikely to manifest on a uniprocessor. Testing on a multiprocessor will help you find these bugs. Eclipse is  is installed on the unix machines.

To find the names of multiprocessor hosts, type any of the following on a CS machine.

cshosts linuxsimpson
cshosts linuxwelding
cshosts linuxcamera

You can then log into a machine (e.g., apu) and verify that it is a multiprocessor by running this command.

cat /proc/cpuinfo

Note that CPUs that only have a single value for "physical id" only have a single physical core, but that core is hyperthreaded.  Hyperthreading allows some limited concurrency, but not as much as a true multi-core processor or symmetric multiprocessor (SMP).  Hmm, all of these machines appear to be hyperthreaded, so that is what we will use.

Synchronizing with transactions

Java does not have an atomic{} block for transactions. Instead, we will run code inside of a transaction using a Java library that we developed. In general, a programmer needs to declare what variables may be accessed within transactions and then use the following skeleton to execute transactions:

public void foo() {
...
(S1) Transaction tx=new Transaction(tx_id);
boolean done=false;
(S2) while(!done){
try{
(S3) tx.BeginTransaction();

/*access transactional objects*/
/*e.g. tx_var=tx_var.TM_getColor(tx);*/

(S4) done=tx.CommitTransaction();
}catch(AbortException e){
(S5) tx.AbortTransaction();
done=false;
}
}
...
}

Note that the constructor parameter (tx_id) for the new Transaction object is a transaction identifier. While you are free to use any numbers as identifiers that uniquely identify transactions, we strongly encourage you to use the lane's color parameter, for example:

    Transaction tx=new Transaction(color);

To ease the implementation, we have already defined the transactional versions of the variables. We have also provided methods to safely access them. These methods have the names TM_function(..), e.g. TM_getColor(), TM_shoot(). You have to use these versions of the functions while completing the transactional assignment. Otherwise you may end up with data races. Make sure you pass the transaction object as a parameter to these functions (e.g. TM_getColor(tx)).

What is the skeleton function foo() doing?

Here is a brief explanation of what is happening inside foo(). First of all we define a new transaction(S1). Since unlike locks transactions can restart we use a while loop to keep trying till the transaction successfully completes its work(S2). We initialize the transaction using the BeginTransaction function(S3). Once all the transactional work is done we attempt to end the transaction by calling CommitTransaction (S4). If there are exceptions while we are inside a transaction then we would need to explicitly abort the transaction (S5).

Adding your own shared variables

The transactional memory system we use, DATM, is a library-based transactional memory. That means that in order to get the benefits of atomic regions, you must access all shared data through functions provided by DATM. If the only shared data you use is contained in our classes, then all you need to do is to use our TM_function(...) variants to access the variables. If you would like to add your own bookkeeping (for example a numColored variable as in RogueCoarseCleaner), then you must be careful to use classes that work with DATM. We have provided TMInt, which can be used for a shared integer. If you need more shared state than some integers, you are probably working too hard. However, if you really really want to implement your own transaction-safe objects, the next section will explain a little bit about working with DATM (and may be helpful even if you do not).

Behind the scenes of DATM

DATM stands for Dependence Aware Transactional Memory. Transactional memory is a form of optimistic concurrency. When you use locking, threads acquire locks to exclude all other threads from accessing certain shared data. With transactions, threads just barge ahead and read and write data concurrenctly (because they are optimistic) until they detect a possible problem (a conflict). Once a conflict is detected, one of the offending threads must undo all of its changes and start again from the beginning of its transaction.

Say you are going to access variable X. A transactional memory system implements the following logic to ensure that your operations are safe:

read_X() {
foreach(active transaction T) {
if(T has written X) {
// conflict! choose one transaction to restart
...
}
}
}

write_X() {
foreach(active transaction T) {
if(T has read X or T has written X) {
// conflict! choose one transaction to restart
...
}
}
}

So each time you access a shared variable in a transaction the TM system must check if another thread with an active transaction has also accessed that shared variable. In DATM you will need to wrap an object using the SharedObject class and then use the open_RO(..) and open_RW(..) methods to access that shared variable in read-only or read-write modes. We provide a simple usage example for TMInt:

SharedObject sharedInt=new SharedObject(new TMInt(2));
...
public void bar(){
...
int newVal=new Random().nextInt(100);
Transaction tx=new Transaction(tx_id);
boolean done=false;
while(!done){
try{
tx.BeginTransaction();

((TMInt)sharedInt.open_RW(tx)).setValue(newVal);

done=tx.CommitTransaction();
}catch(AbortException e){
tx.AbortTransaction();
done=false;
}
}
...
}

In the previous example, each transaction attempts to write a random value to a shared integer variable. In summary, to create shared variables that are safe to use in transactions:

  1. Create a class that implements the TxObject interface (look at TMInt.java)
  2. Wrap the object using the SharedObject class
  3. Use the transactional skeleton and the functions open_RO(..) and open_RW(..) to mediate access.

If you look inside the implementations for GalleryLane see that they store all of their data (such as the color of the lane) this way.

Take home points for you, the programmer

Further information

If you would like to investigate DATM further, you can read this paper (hardware approach) or this one (software and proof).

The Survey

This assignment is a bit unusual in that we are asking you to do a survey along with the assignment.  The survey is part of our research into concurrent programming.  By accurately and honestly filling in the survey, you can help contribute to our research.  I must emphasize that we do need you to be accurate and honest.  If you find that you cannot answer a survey question, please do not make something up.  Also, please don't embellish to make yourself look "good."  The survey results are not part of your evaluation in the course, and will not affect your grade.

Please record the time you spend on the lab from the beginning in 15 minute (0.25 hr) increments.  Please keep notes saying you spend 1.25 hrs on general design with a friend, then 1.75 hrs on the design on the RogueCoarse, 2.25 hrs implementing RogueCoarse, and then 5.5 hrs debugging and testing.  Keeping notes as you do the lab is the only real way to get accurate results for the survey.  Note that the survey can be saved and resumed, so you can use it to record your progress through the lab.

The survey will be made available soon through Blackboard. The survey will allow you to fill out part of it and save your responses.  So we urge you to take a look early, and fill it out as you go.

Working In Pairs

You may work on this lab alone or in pairs.  There is not a lot of code to write, so if you learn well alone, and can manage your own frustration, I would recommend doing the lab without a partner.  If you like to talk out coding issues, or find having another perspective valuable in your own thought process, then you are welcome to work with a partner.  If you do work with a partner, please work together, DO NOT have one partner do TM and one partner do Fine (for example).

In pair programming, two students must sit at the same computer and both must participate actively. We require each student who is in a pair programming group to spend at least 80% of the total development time participating actively, and at least 40% of that time being the one typing on the keyboard. (This style does not mean two students can split up the program, work separately, and then reconvene to put the solution together.)

Here is a website on how to do pair programming and why.  Some more information from ye olde wikipedia.

If you pair program, then please do the following.

  • Only one person turns in the assignment.
  • Your README file also contains the name of both people in the pair.
  • What to Turn In

    Please turn in a single jar file which has your Rogue code along with your README, AUTHORS.txt, and ANSWERS.txt. The contents of these files are described below. This may be automated by running "make jar; make turnin;" in the directory with your Rogues. We will unpack your jar file, and use the ORIGINAL source files you were given for the rest of the project.  ANY MODIFICATIONS YOU MADE TO  FILES BESIDES the Rogues WILL BE LOST WHEN WE TEST YOUR PROGRAM, SO DON'T CHANGE THESE FILES.

    AUTHORS.txt includes a sequence of lines.  Each line has a name, followed by a colon (:) followed by your eid.

    In your README file, include the following:

    Lab Questions

    Please put these questions and your answers in a file called ANSWERS.txt.
    1. What group are you in (and what string did you md5sum to decide your group)?

    2. Your "timesheet" for the various tasks on this lab as described above

    Other Details and Submission Instructions