The goal of this assignment is to come up with a plan for Assignment 3.
Assignment 3 is a more open-ended assignment, where you have the flexibility to pursue an OS topic or subsystem that interests you. The goal of the first part of this assignment then, is to identify roughly what you will be doing for the rest of the assignment. Unless you have made other arrangements, you should focus on working within the field of Linux device drivers.
You must submit a proposal (~1 page long), which covers:
Provide a detailed timeline of how you plan to build the system. It is really important to have intermediate milestones where some subset of functionality is completely working by date X rather than working on all functionality in parallel and finding out what works on the deadline. Give a list of 4 key milestones.
What infrastructure will you have to build to run the experiments you want to run? What hardware will you need and where will you get it? (Talk to me early if you have an experiment that needs hardware support but you don't know where to get the hardware from.)
There are many resources concerning device drivers (in addition to the actual code/documentation that came with the kernel source). For example, there is a book "Linux Device Drivers" by Rubini & Corbet. (A slightly older 2nd edition is available online at this site.I will review your proposal, and I might request a revision.
A critical source of overheads for non-transactional code in TxOS are dynamic checks in the kernel whether the current thread is in a transaction or not. In the non-transactional case, these expensive dynamic checks could be eliminated if instead we could compile two versions of the code in question.
However, we do not wish to copy-and-paste large swaths of Linux source code. Instead, we would like to use a compile-time tool to build the two versions and create two complete implementations of each system call (at least the relevant portions).
We hypothesize that a tool like CIL (C Intermediate Language) would be sufficient for this task. Step one would be to familiarize yourself with CIL, and then try to perform a compile time transformation to a single, simple system call, like getpid. From there, you could incrementally replace more complex system calls.
2) Implement speculation using the
Transactions and speculations (see Nightingale's SOSP 05 and OSDI 06 papers) are similar in that they both provide a degree of isolation to a set of work, but the guarantees are different. Moreover, a substantial portion of the implementation of this isolation in each system functionally overlaps. It would be interesting to see the amount of additional effort required to leverage the transaction implementation to provide speculation.
I would recommend first reproducing the results for hiding the latency of synchronous writes to a local file system, and perhaps next looking at NFS latency. It would be interesting to see if speculation could be used to hide the latency of committing durable transactions. For extra credit, find additional uses for speculation, with or without transactions.
3) Parallelize apt with transactions (probably easier).
TxOS currently provides transactional installation of single packages. Aside from the internal bookkeeping, apt installs rarely touch the same files and could likely be executed concurrently in the common case. In the uncommon case, two packages may touch the same configuration files or have a dependence that makes parallelization unsafe. The problem is that this is difficult to reason about. The safety guarantees of transactions could make this a more realistic option.
The main focus of the project would be to modify apt to use system transactions in TxOS to parallelize a large software upgrade (like a distribution upgrade). Ideally, one would understand apt well enough to leverage its internal knowledge of package dependences to make smart choices, such as scheduling dependent packages in the same process or transaction.
A likely problem may be package database structure introducing false conflicts, and this may require some straightforward modification to the dpkg bookkeeping.
For the particularly ambitious group, TxOS currently only supports flat nesting, but it might be interesting to experiment with closed nesting to commit the install as one unit but use nested transactions internally for safety. Implementing closed nesting support could be a sub-project or may become available if a group shows significant progress.
4) Large system transactions
Currently, system transactions are constrained to the size of memory. A trivial solution for large data writes would be to spill them to swap, but this will likely perform poorly. A more appealing solution might allocate free disk blocks for data and later update the file system bookkeeping. The downside to this solution is the risk of complicating the VFS interface. The goal of the project might be to explore the design tradeoffs between performance of large transactions and introducing complexity into the API.
One might begin by implementing the simple swap-based solution and then the most straightforward interface to acquiring free blocks. After measuring this, find ways to refine the performance or reduce the complexity of the interface. The project should also attempt to compare performance with database systems and BDB, which will likely do better. An interesting nugget might be the crossover point in terms of data set size where a database outperforms OS transactions.