CS439 Project ADisk - An Atomic Disk

Due: 5:59:59 PM April 9 2012

Assignment Goals

Overview of Project

In projects ADisk and FS, you will construct a user-level library that presents the abstraction of a reliable file system called RFS. In order to manage the complexity, you will implement this system in several phases, each of which presents a successively higher-level abstraction. You will be given the abstraction of a raw disk interface. On top of this you will build
  1. An atomic disk
  2. A reliable multi-level tree
  3. A reliable flat file system
  4. A reliable directory-based file system (RFS)

This project, deals only with step 1 above. It is very important to do an excellent job on this part of the project, or you will have significant problems in the next part.

The Assignment

Before you begin the assignment, grab the following code: lab-FS.tar

This tar archive contains the source code to the raw disk interface as well as files defining the other interfaces listed below. To extract the files from the archive, use the following command.

tar -xvf lab-FS.tar
A directory called lab-FS/ will be created, and your files will be extracted into it.
Part 0: Understand the supplied low-level disk system

RFS will be implemented using an 8-Mbyte file with a "device driver" that treats this storage space as a contiguous array of raw disk sectors, each of size Disk.SECTOR_SIZE bytes. This driver is similar to Linux's block device interface. The disk sectors are numbered from 0 to Disk.NUM_OF_SECTORS- 1 inclusive. No partial read or write to a disk sector is possible. This important restriction has several implications on the design of the file system. For example, updating a single byte in a disk sector requires that the sector be read in memory, modified, and then written back to disk. Similarly, reading a single byte in a disk sector requires that the sector be read entirely from disk though only one byte is to be read.



A Disk provides an asynchronous interface: requests are issued and later a callback provides the results of the action. When you create a Disk, you register an object that will receive callbacks:

public Disk(DiskCallback callback) throws FileNotFoundException

The callback is an object that will receive notifications when operations complete (see below).

Once a Disk is created, the following method initiates a read or write request:

public void startRequest(int operation, int tag, int sectorNum, byte b[]) throws IllegalArgumentException, IOException

Warning: You must not re-use b[] until the operation is complete.

Warning: You are not guaranteed that reads or writes will complete in the order they are issued. In fact, the thread inside of Disk will process pending requests in a random-ish order.

You can constrain the order of operations to disk by inserting barriers with the following call:

public void addBarrier() throws IOException

All reads and writes issued before a barrier will be completed before any reads or writes issued after the barrier.

Note that although you could put a barrier before/after each read and write, for good performance in practice (and full credit for the lab), you should use barriers judiciously. Whenever you use a barrer, your code should include a comment explaining exactly why you need one. Be brief and precise; if the reader cannot easily figure out what your comment means, it is not useful.

For testing/debugging, you can also arrange for your disk to crash with some probability at some point in the future.

public setFailProb() See source file for details.


DiskCallback, DiskResult, and DiskUnit

A DiskCallback defines an interface that must be implemented by an object that receives completion notices in the form of a DiskResult when a read or write request finishes.

public void requestDone(DiskResult result)

So, when the operation you started with Disk::startRequest() completes, DiskCallback::requestDone() will be called. DiskUnit gives some examples of how this works.


Part 1: Build an atomic disk

Your task is to design, implement, and thoroughly test an atomic disk. The atomic disk allows a series of writes to different sectors to be made and then atomically applied -- either all the writes occur or none of them do. The atomic disk presents a similar interface to the disk, except (a) each read and write includes a transaction ID and (b) there are additional calls to begin, commit, abort, and apply transactions. You will implement this atomicity by providing an redo log on disk This redo log will consume ADisk.REDO_LOG_SECTORS sectors of your disk, so users of the ADisk will see a smaller disk (n o larger than Disk.NUM_OF_SECTORS - ADisk.REDO_LOG_SECTORS sectors; possibly a few sectors smaller than that if you reserve sectors for other disk metadata such as a start-of-log pointer.). 

In particular, an ADisk implements the following interface:

// Allocate an ADisk that stores its data using
// a Disk.
// If format is true, wipe the current disk
// and initialize data structures for an empty 
// disk.
// Otherwise, initialize internal state, read the log, 
// redo any committed transactions
public ADisk(boolean format)


// Return the total number of data sectors that
// can be used *not including space reseved for
// the log or other data sructures*. This
// number will be smaller than Disk.NUM_OF_SECTORS.
public int getNSectors()


// Begin a new transaction and return a transaction ID
public TransID beginTransaction()


// Store the update in an in-memory buffer associates with
// the specified transaction, but don't write it to the log yet.
public void writeSector(TransID tid, int sectorNum, byte buffer[]) 
    throws IllegalArgumentException, IndexOutOfBoundsException


// Read the disk sector numbered sectorNum and place
// the result in buffer. Note: the result of a read of a
// sector must reflect the results of all previously
// committed writes as well as any uncommitted writes
// from the transaction tid. The read must not
// reflect any writes from other active transactions
// or writes from aborted transactions.
void readSector(TransID tid, int sectorNum, byte buffer[]) 
     throws IOException, IllegalArgumentException, IndexOutOfBoundsException


// Write all of the transaction's updates to the log.
// Then, put a barrier in the write queue.
// Then, write "commit" to the log
// Then, wait until the "commit" is safely on disk.
// Ensure that eventually,
// the updates within the transaction are
// writen to their specified sectors, but don't wait for
// the writeback to finish before returning
// e.g., mark the transaction data structure as
// "committed" and move it to the writeback queue.
// Hint: Things will probably be easier if
// you make sure that commit i+1 cannot land
// on disk before commit i is on disk.
// Barriers may help here.
void commitTransaction(TransID tid) 
    throws IOException, IllegalArgumentException

// Free up the resources for this transaction without
// committing any of the writes.
void abortTransaction(TransID tid)  
    throws IllegalArgumentException


Asynchronous IO

Writes to the log within a transaction must be asynchronous -- (1) write() must return without waiting for its write to go to disk. (2) When a transaction commits, you must send the writes to the log, issue a barrier request, and then send the commit record to the log. (3) Only after the commit (and all the writes in the transaction it commits, of course) is (are) safely in the log, can the commit() call return. (4) Only after the commit has completed, you should arrange for the updates to get written to their final locations on disk. (5) The commit() call should return before these write-backs (to the final locations on disk) complete; it may return before these write-backs are even issued (e.g., another thread may issue those writes.)

Hint: If two transactions update the same sector, you need to make sure that the write from the transaction that commits later happens after the write from the earlier-committed transaction. A good way to do this might be to have a producer/consumer queue for committed transactions' writes. Then, commit() puts a set of writes into the queue, and a single write-back thread pulls a transaction's writes from the queue, asynchronously issues them all, waits for them all to complete, and then moves to do the write-back for the next transaction's writes.

Hint: The log will have a sequence of writes for a transaction followed by a commit record for that transaction (or no commit record if the system crashed before the commit made it to disk.) The problem is: How do you know if the sector you are reading is a "commit" or just a sector update that happens to look like a "commit"? You also need to know which sectors were updates by the transaction's writes. So, you probably want to preceed a transaction's writes in the log with a description of the transaction's writes: how many are there, what sector is updated by each write, etc. Thus, it may be useful if your log looks like: [transaction_metadata [sectors]* transaction_commit]*


Note that your implementation must use locks and condition variables to enforce concurrency control.

One shared data structure you are likely to have will be an object to represent each in-progress transaction. Multiple threads may concurrently attempt to read/write/commit/abort the same transaction. Your code must synchronize such concurrent access.

Multiple in-progress transactions may try to update the same sector. A read by transaction T of sector S must return the data from the latest write to sector S by transaction T, if any. If transaction T has not updated sector S, then a read must return the data written by last write of sector S by the last-committed transaction T' that wrote sector S.

Hint: If you have a per-transaction set of updates and a write-back queue of committed transactions' writes, then a read could first check it's transaction, then check the write-back queue, and then, if necessary, read from disk. When you read from the write-back queue, make sure you enforce the "last commit wins" rule.

Hint: It is logically possible to have two transactions committing "at the same time", so that, for example, both t1 and t2 issue their writes to the log at the same time (to different locations in the log, obviously). Then, t1 and t2 could both issue their commits. This "optimization" is not required. In fact, it is a probably a bad idea for at least two reasons. First, it will complicate recovery (after a crash transaction i in the log may not have a commit in the log, but transaction i+1 or i+7 may). Second, to the extent that the updates get performed in a way that takes advantage of this extra concurrency, performance would be worse not better (due to extra rotational delays between writing a transaction's data and its commit.) The point of this whole hint is: You probably want to ensure that only one thread/transaction is doing a commit at a time. Put a barrier in an appropriate place.


When your system starts, you need to make sure that all updates in the log from committed transactions (1) are immediately visible to reads and (2) are eventually written to their final locations on disk. So, your recovery code must read data from the logs, update in memory data structures, update disk, or both. Make sure you follow the ordering rule above (if two transactions update the same sector, the update from the transaction that committed last is the one that must "win".) Notice that transactions may not commit in the same order that they begin, so the tid you assign when a transaction starts cannot be used to order updates on recovery.

Hint: One option is to have special-purpose code that reads each transaction from the log, does that transaction's write-back, waits for the write-back to complete, and then goes on to the next committed transaction. But there may be a simpler way: you probably already have code that applies a list of transactions in a specified order. And that makes sure that reads "see" these writes while they are still pending. Can you re-use that code?

Circular log

Your on-disk log is fixed size, and you should treat it as a circular log with new updates added at the head and the tail pointing to the oldest update from a committed transaction that has not been (or may not have been) written back to its final location on disk. So, every write to a transaction moves the head forward. And, when a write-back completes, the tail moves forward. Obviously, you must make sure the head does not pass the tail.

If a transaction tries to commit when there is not room to send its writes to the log, the commit call should block until the transaction's updates and commit can be written to the log. (If a single transaction has too many writes to fit in the log, even if the log were otherwise empty, you should throw an exception either on the culprit write or on the commit.)

On recovery, ideally the recovery thread would read the log from the tail to the head so that it can commit exactly the writes that have not been written back yet. Note, however, that it is safe to start before the tail -- it is no problem to re-write-back a write that has already been written back (as long as ordering constraints are followed.) So, you don't need to have a perfect on-disk head or tail pointer. Still, it may simplify things if you know about where the tail is. One option is to store a "log-start" pointer in a well-known sector on disk. The log-start must always be at or before the tail and the head may not cross the log-start.

Unit tests

As your career progresses, you will find that writing simple sanity-check tests for each object you write will save you enormous amounts of time and make you a much more productive programmer. So, we should not have to require you to write any specific tests because you should already be planning to do so. But, thorough testing of this part of the project is so important to the rest of the project, that we're going to require it.

Write a program called ADiskUnit that executes a series of unit tests for ADisk. Designing such tests is a skill that deserves practice, so we do not provide a full list of what you should test. You might start with simple reads and writes within a transaction. Then look at reads and writes across transactions that commit or abort. Be sure to test garbage collection and crash recovery. You should have some tests with a single thread and some with more. Should probably have some tests for the simple case when the disk doesn't crash and some for the case where it does. Etc. We strongly recommend writing the tests as you add each piece of functionality rather than trying to write a bunch of tests as an afterthought once you have finished the project.

The TA will also have a set of unit tests (other than yours) to use for grading your code. So, passing your tests is not necessarily enough. But, if you design good tests, then passing your tests makes it much more likely that you will pass the TA's tests.

Additional comments

Internal Data Structures and Approach
Your ADisk should have the following internal structures and types:
Break this project into a bunch of small, managable, testable steps. Here is one sequence. At each step, write and run some unit tests to make sure things are working as expected. Your README should describe the steps you took and the unit tests at each step.
Other hints
What to Turn In

All of your implementations must adhere to (e.g., must not change) ADisk's public interfaces specified above. Also, you may not modify the Disk interface in any way. You may add additional public methods to ADisk; we don't think you will need to do so (except perhaps to add testing interfaces). You can change the interfaces to the other "internal" objects/classes. Electronically turn in (1) your well commented and elegant source code and (2) a file called README.

Your README file should include 4 sections:

The following guidelines should help smooth the process of delivering your project. You can help us a great deal by observing the following:

80% Code

Remember that your code must be clear and easy for a human to read. Also remember that the tests we provide are for your convenience as a starting point. You should test more thoroughly. Just passing those tests is not a guarantee that you will get a good grade.
20% Documentation, testing, and analysis
Discussions of design and testing strategy.

Start early, we mean it!!!