CS 439 Spring 2012: Project RFS

CS372 Project RFS - A Reliable File System

FS-2 (Part 2a, 2b) Due: 5:59:59 PM April 23 FS-3 (Part 3) Due: 4:59:59 PM May 4

Assignment Goals

To learn about key file system concepts: logging, metadata, and directories

Overview of Project

You will construct a user-level library that presents the abstraction of a reliable file system called RFS. In order to manage the complexity, you will implement this system in 4 phases, each of which presents a successively higher-level abstraction. You will be given the abstraction of a raw disk interface. In project ADisk, you constructed an atomic disk. On top of this you will build

A reliable multi-level tree in which a collection of data blocks can be stored

A reliable flat file system

A reliable directory-based file system (RFS)

The Assignment

Make a copy of your code from project ADisk. You will begin with the Disk abstraction we provided and ADisk abstraction you constructed.

Part 0: Understand the supplied low-level disk system

Completed in project ADisk.

Part 1: Build an atomic disk

Completed in project 3.

Part 2a: Build a multi-level persistent tree

In this part of the project, you will create a persistent on-disk tree abstraction using your atomic disk abstraction. The disk will store up to MAX_TREES trees, each of which is identified by a TNum. The leaves of each tree are data blocks, and the trees grow as you add more blocks. You will make use of the ADisk to ensure that you can issue a series of updates to a tree or trees and have them occur atomically. For example, you could {update the free list to indicate that two blocks have been consumed, add a data block to one tree, add a data block to another tree causing that tree to grow the number of internal nodes it has} as a single atomic operation.
Interface: Your PTree (persistent tree) class should implement the following public methods:
Tree(boolean doFormat) throws IOException This function is the constructor. If doFormat == false, data stored in previous sessions must remain stored. If doFormat == true, the system should initialize the underlying disk to empty.
TransID beginTrans(): This function begins a new transaction and returns an identifying transaction ID.
void commitTrans(TransID xid) throws IOException, IllegalArgumentException This function commits the specified transaction.
void abortTrans(TransID xid) throws IOException, IllegalArgumentException This function aborts the specified transaction.
int createTree(TransID xid) throws IOException, IllegalArgumentException, ResourceException This function creates a new tree and returns the TNum number (a unique identifier for the tree).
void deleteTree(TransID xid, int tnum) throws IOException, IllegalArgumentException This function removes the tree specified by the tree number tnum. The tree is deleted and the corresponding resources are reclaimed.
int getMaxDataBlockId(TransID xid, int tnum) throws IOException, IllegalArgumentException This function returns the maximum ID of any data block stored in the specified tree. Note that blocks in a tree are numbered starting from 0.

void readData(TransID xid, int tnum, int blockId, byte buffer[]) throws IOException, IllegalArgumentException This function reads PTree.BLOCK_SIZE_BYTES bytes from the blockId'th block of data in the tree specified by tnum into the buffer specified by buffer. If the specified block does not exist in the tree, the function should fill *buffer with '\0' values.
void writeData(TransID xid, int tnum, int blockId, byte buffer[]) throws IOException, IllegalArgumentException This function writes PTREE.BLOCK_SIZE_BYTES bytes from the buffer specified by buffer into the blockId'th block of data in the tree specified by tnum. If the specified block does not exist in the tree, the function should grow the tree to include the new block. Notice that this growth may require updating multiple data structures -- the free list, the pointer to the tree root, internal tree nodes, and the data block itself -- and all of these updates must be done atomically within the transaction.

void readTreeMetadata(TransID xid, int tnum, byte buffer[]) throws IOException, IllegalArgumentException This function reads PTree.METADATA_SIZE bytes of per-tree metadata for tree tnum and stores this data in the buffer beginning at buffer. This per-tree metadata is an uninterpreted array of bytes that higher-level code may use to store state associated with a given tree.

void writeTreeMetadata(TransID xid, int tnum, byte buffer[]) throws IOException, IllegalArgumentException This function writes PTree.METADATA_SIZE bytes of per-tree metadata for tree tnum from the buffer beginning at buffer.
int getParam(int param) throws IOException, IllegalArgumentException This function allows applications to get parameters of the persistent tree system. The parameter is one of PTree.ASK_FREE_SPACE (to ask how much free space the system currently has), PTree.ASK_MAX_TREES (to ask what is the maximum number of trees the system can support), and PTree.ASK_FREE_TREES (to ask how many free tree IDs the system currently has). It returns an integer answer to the question or throws IllegalArgumentException if param does not correspond to one of these value..

For all of these methods

IOException is thrown if the request is unable to complete the necessary disk accesses

IllegalArgumentException is thrown if the caller specifies a non-existant transaction or tree

ResourceException is thrown if there are not sufficient resources to complete the operation

We will provide the code PTree.java and ResourceException.java

Requirements on implementation internals:
You are required to use the following basic on-disk data structures for the trees.

A tree has three types of node: a TNode (the root of the tree), zero or more internal nodes, and data blocks (the leaves that actually store data). Each internal node and each data block has a size of TNode.BLOCK_SIZE_BYTES bytes.

The root of each tree is a TNode. A TNode contains space for up to PTree.TNODE_POINTERS pointers to blocks that may be internal nodes or data blocks (see below.) Additionally, each TNode should also hold PTree.METADATA_SIZE bytes of uninterpreted per-tree state that can be used by higher levels of software. Also, you need to store a treeHeight value (see below) in each TNode. Finally, you may want to store other per-tree metadata that you define in a TNode.

A tree's TNode includes a treeHeight value that specifies how many levels of nodes are in the tree. If a tree's treeHeight is 1, then the tree's TNODE_POINTERS point to data blocks. If a tree's treeHeight is 2, then the tree's TNODE_POINTERS point to internal nodes of the tree, and each internal node points to up to POINTERS_PER_INTERNAL_NODE data blocks. If a tree's treeHeight is 3, then there is a root, two levels of internal nodes, and leaves. Etc. A tree's current height is determined by the ID of the highest block written in the tree: if no block higher than TNODE_POINTERS has been written, the height is 1; otherwise, if no block higher than TNODE_POINTERS * POINTERS_PER_INTERNAL_NODE has been written, the tree hight is 2; otherwise, if no block higher than TNODE_POINTERS * POINTERS_PER_INTERNAL_NODE^2 has been written, the tree hight is 2; etc. The maximum height of a tree is determined by the maximum allowed block ID TNode.MAX_BLOCK_ID.

Your implementation must support "holes" in tree data and sparse trees: (1) An internal node should only be allocated if there is at least one leaf that is that node's descendent and that has been written. (2) A tree's height should be the minimum height needed to accomodate all non-empty leaves. E.g., if I seek to a random offset in the tree and write a block, the system should create the necessary internal nodes to reach the newly written block, but it should not populate other data blocks or internal nodes. Thus, notice that TNODE_POINTERS and pointers in internal nodes can be NULL.

Notice that there is a read from a block ID that has never been written, your implementation should return a zero-filled block rather than throwing an exception. Your implementation must not allocate new internal nodes (or leaf blocks) in this case, however.
Thus, the treeHeight of a tree must change dynamically as new blocks are written. For example, if all writes to a tree are to block IDs less than TNODE_POINTERS, then the tree's height is 1. Then, if a write to block (TNODE_POINTERS + 1) occurs, you will need to (1) increase the treeHeight to 2, (2) allocate an internal node, (3) copy the existing TNODE_Pointers from the TNode to the new internal node, (4) add a new block pointer in the internal node, (5) clear the existing TNODE_POINTERS from the internal node, (6) initialize the TNode's first pointer to point to the new internal node. Aren't you glad you have transactions?

Your readData and writeData must read into memory the TNode and the InternalNodes between the root and the leaf in question and no others. It is not acceptable, for example, to always read an entire tree into memory before operating on any part of it.

You should store an array of PTree.MAX_TREES TNodes (tree roots) across a fixed set of disk sectors. Thus, given a tree number tNum, you should be able to locate the place on disk where the corresponding TNode structure is stored.
You may not "preload" your forest of trees into memory. Nor may you store the full array of TNodes in memory. Doing so is feasible for our 8MB file system, but would not be reasonable if our disk were large. Instead, you should just read pieces of trees as you need them for the current transactions. You are not required to implement a cache of recently read sectors---simplicity trumps performance for this project. You may store a copy of the free block bitmap in memory if you like, but this is not a requirement.

In addition, you are required to implement a free block list, and this free block list must be stored as an on-disk bitmap.

In addition to your TNode array and free block bitmap, you may want to store other per-file-system data in a sector or block of disk that you reserve for that purpose.

Simplifying assumption:

Although the ADisk you built supports multiple concurrent transactions, to simplify the design you are not required to support multiple concurrent transactions at the PTree layer or above. You must, however, still be thread safe by ensuring that multiple threads from the same transaction are coordinated. In particular, you may (1) use a single lock to coordinate access through all public methods, (2) have beginTrans() block if there is already an outstanding transaction, and (3) have abortTransaction() and commitTransaction() signal a thread waiting in beginTransaction() (if any).

Hints: You are not required to follow this advice, but I think it might help. Feel free to ignore it if you have a better way.

You should select four ranges of blocks on the disk for four key data structures: file system global parameters (e.g., the number of free sectors or tree roots), the array of tree root pointers, the free sector list, and the remaining sectors.

In your TNode implementation, you may find it convenient to store additional information about the tree in the TNode. You are welcome to do so.

You may want to use Java Object Serialization to move objects between memory and disk.

Begin this part of the project by writing down your main data structures, and then writing pseudo-code for each of the above functions in terms of methods on these data structures.

Test as you go. Build a subset of the above functionality and test it thoroughly, rather than trying to test all of these functions at once. Test not only the external interfaces, but also invariants you know about the internal structure. For example, you could write a test that walks the InternalNodes of a specified tree to test some invariant that you know must hold. For example, you could write a test that walks all trees to make sure that the blocks allocated to the trees are all marked as used on the free map.

Part 2b: Flat File System

In this part of the project, you will build a "flat file system" that implements files but not directories. A flat file system allows you to read and write files that are named by inumbers rather than path names, which would not be convenient for end users, but which will form the basis for the rest of the system. You will use the persistent trees created in part 2. Each tree will store a file.

You should implement the following interface:

FlatFS(boolean doFormat) throws IOException This function is the constructor. If doFormat == false, data stored in previous sessions must remain stored. If doFormat == true, the system should initialize the underlying disk to empty.
TransID beginTrans() This function begins a new transaction and returns an identifying transaction ID.
void commitTrans(TransID xid) throws IOException, IllegalArgumentException This function commits the specified transaction.
void abortTrans(TransID xid) throws IOException, IllegalArgumentException This function aborts the specified transaction.
int createFile(TransID xid) throws IOException, IllegalArgumentException This function creates a new file and returns the inode number (a unique identifier for the file)

void deleteFile(TransID xid, int inumber) IOException, IllegalArgumentException This function removes the file specified by the inode number inumber. The file is deleted and the corresponding resources are reclaimed.

int read(TransID xid, int inumber, int offset, int count, byte buffer[]) IOException, IllegalArgumentException, EOFException This function reads count bytes from the file specified by inumber into the buffer specified by buffer. The parameter offset specifies the starting location within the file where the data should be read. Upon success, the function returns the number of bytes read (this number can be less than count if offset + count exceeds the length of the file. The method throws EOFException if offset is past the end of the file.

void write(TransID xid, int inumber, int offset, int count, byte buffer[]) IOException, IllegalArgumentException This function writes count bytes from the buffer specified by buffer into the file specified by inumber. The parameter offset specifies the starting location within the file where the data should be written. Attempting to write beyond the end of file should extend the size of the file to accommodate the new data.

void readFileMetadata(TransID xid, int inumber, byte buffer[]) throws IOException, IllegalArgumentException This function reads getParam(ASK_FILE_METADATA_SIZE) bytes of per-file metadata for tree tnum and stores this data in the buffer beginning at buffer. This per-file metadata is an uninterpreted array of bytes that higher-level code may use to store state associated with a given file.

void writeFileMetadata(TransID xid, int inumber, char *buffer) throws IOException, IllegalArgumentException This function writes getParam(ASK_FILE_METADATA_SIZE) bytes of per-file metadata for file inumber from the buffer beginning at buffer.

int getParam(int param) throws IOException, IllegalArgumentException This function allows applications to get parameters of the file system. The parameter is one of FlatFS.ASK_MAX_FILE (to ask the maximum number of files the formatted file system supports), FlatFS.ASK_FREE_SPACE_BLOCKS (to ask how many free blocks the file system currently has), FlatFS.ASK_FREE_FILES (to ask how many free inodes the system currently has), and FlatFS.ASK_FILE_METADATA_SIZE (to ask how much space there is for per-file metadata). It returns an integer answer to the question

We will provide the code FlatFS.java.

Hints: This layer adds very little to the persistent tree layer. Primarily, instead of reading and writing blocks, now you read and write ranges of bytes. So, read and write will need to translate requests for ranges of bytes into a series of requests for blocks. Also, notice that if offset < file length and offset + count >= file length, the read function should only read to the end of the file and not beyond it (returning a value smaller than count). So, you will need to store the file length in bytes with each tree.

Part 3: Hierarchical File System

File systems would be less useful if you needed to remember the inumber of each file you create. File systems therefore use a higher-level API with hierarchical file names to make it easy to organize and remember where data are stored.

A directory is treated in RFS like any other file, except that it can not be written to directly by user programs. The directory file consists of several entries, each describing a file or a directory. Each directory must contain at least two entries. The first one refers to the parent directory, and has the name "..", like in UNIX. The root directory's parent is the root directory itself, which is the only exception. The second mandatory entry has the name "." which points to the directory itself.

Updates to the directory structure occur only as a result of a file deletion or creation. A directory entry contains a flag showing whether the entry is used or not. You may want to include other status information in this flag according to your design. The flag is followed by the index of the inode of the file or directory corresponding to that entry. The last field in the entry is the file or directory name, which is a fixed-length string. Note that in practice, we would not want to use fixed-size arrays to store names as they would cause unacceptable inefficiency in disk access speed and space utilization, but we allow this simplification for the project.

We provide a simple template for a directory entry class, DirEnt.java. Note that this is an internal detail of your file system. You are welcome to change it.

When you format the disk, you will need to create the root directory, whose inode should always be at a known location. For example, in UNIX, the root directory is typically stored as inode 0, 1, or 2.

RFS allows users to create hierarchically-named files e.g., "/foo/bar/baz." A file name used in any of these functions is a String. No component of the name between two '/' characters or after the last '/' character can exceed FS_MAX_NAME characters in length.

Although file names are convenient for users, requiring string manipulation on each system call would increase the overhead of file access. Thus, the API allows users to open files using their names and then to read and write open files using file descriptors. A file descriptor is an integer between 0 and FS_MAX_FD that you will use as an index to an open-file-descriptor table that you will maintain. User programs use file descriptors to identify files in file system calls instead of repeatedly using file or directory names.

Note that each operation that operates on a directory must execute atomically: either the entire operation completes or the file system is left in the state it was in before the request issued. Similarly, a set of operations that operate on an open file must all operate as a single atomic unit. Notice that unlike the previous interfaces, these interfaces do not take a transaction ID -- of these library calls internally create and commit transactions as needed. You should begin/end transactions within your code for these calls. This arrangement is more convenient for the user, but it has the disadvantage that a user cannot cause several of these calls to be executed atomically.

A description of each file system call follows:

RFS(boolean doFormat) throws IOException This function is the constructor. If doFormat == false, data stored in previous sessions must remain stored. If doFormat == true, the system should initialize the underlying file system to empty.
int createFile(String filename, boolean openIt) throws IOException, IllegalArgumentException: This function atomically creates a new file with the name filename. Filename is a full pathname (starting with "/"). If the parameter openIt is true, the function returns a file descriptor of the open file corresponding to the newly created file; in this case, the initial create(), a sequence of zero or more read() and write() calls to that file, and a final close() should all occur within a single transaction.

void createDir(String dirname) throws IOException, IllegalArgumentException This function atomically creates a directory entry with the name dirname. As before, the name is interpreted as a full pathname.

void unlink(String name) throws IOException, IllegalArgumentException This function atomically removes the entry specified by the name. The name is interpreted as usual. If the name corresponds to a file and the file is not currently open, it is deleted and the corresponding resources are reclaimed. If name corresponds to a directory, it is deleted only if it is an empty directory.

void rename(String oldName, String newName) throws IOException, IllegalArgumentException This function atomically changes the name of an existing file oldName into a new file newName.

int open(String name) throws IOException, IllegalArgumentException This function performs a lookup on the file or directory whose name is specified by name. The character string specified by name must start with "/" making name a full pathname that starts from the root of the file system. The call returns a file descriptor that can be used later to refer to the file or directory specified by the search path. The function fails if name does not specify an existing file, if no file descriptors are free, or if the name corresponds to a directory. All reads and writes to the open file are part of a single transaction.

void close(int fd) throws IOException, IllegalArgumentException This function closes the open file indicated by the file descriptor fd and commits any updates. Subsequent access to files through the fd descriptor must return an error, until the fd is reused again in an open call. Also, any resources used to support the file descriptor should be reclaimed at this point.

int read(int fd, int offset, int count, byte buffer[]) IOException, IllegalArgumentException This function reads count bytes from the file specified by fd into the buffer specified by buffer. The parameter offset specifies the starting location within the file where the data should be read. Upon success, the function returns the number of bytes read (this number can be less than count if no more bytes are available from the position specified by offset until the end of the file).

void write(int fd, int offset, int count, byte buffer[]) IOException, IllegalArgumentException This function writes count bytes from the buffer specified by buffer into the file specified by fd. The parameter offset specifies the starting location within the file where the data should be written. Attempting to write beyond the end of file should extend the size of the file to accommodate the new data. These writes will commit when the file is closed.

String[] readDir(String dirname) IOException, IllegalArgumentException This function atomically reads the entries that exist in the directory specified by dirname. and returns the result in an array of String objects.

int size(int fd): IOException, IllegalArgumentException This function returns the number of bytes contained in the open file identified by fd.

int space(int fd): IOException, IllegalArgumentException This function returns the number of data blocks (excluding internal nodes) consumed by the open file identified by fd. Notice that space has to consider the existence of holes while size is not affected by holes in a file.

We will provide the skeleton code RFS.java.

Hints: Note that a char in java is two bytes. Don't forget that our persistent tree abstraction lets you stash some extra data of your choosing in a tnode.

What to Turn In

All of your implementations must adhere to (e.g., must not change) the public interfaces specified above. You may not modify the Disk interface in any way. You may add additional public methods to ADisk, PTree, or FlatFS, but we don't think you will need to do so. Although the "internal interfaces" of parts 1 and 2 would not be accessible to a "normal user" of the file system you create in part 3, we will test those internal interfaces during grading.

Electronically turn in (1) your well commented and elegant source code and (2) a file called README. Turn in the entire body of source code needed for this project (e.g., turn in your ADisk again for project F-II and your flat file system again for F-III).

Your README file should include 5 sections:

Section 1: Administrative: your name (and your partner's), the number of slip days that you have used on this project and so far
Section 2: A list of all source files in the directory with a 1-2 line description of each
Section 3: 3 (or 4) short paragraphs, each describing the high-level design of part 1 and part 2 (and part 3).
Section 4: A 1/2 page high-level description of how you ensure that all actions by the PTree and FlatFS (and PFS) are made atomic. E.g., explain how you structured your code so that even if the system crashes in the middle of any operation, after recovery, all on-disk data structures are either in the state it was before the operation or after the complete operation, but not in a middle-stage
Section 5: A discussion of your testing strategy. Outline the programs you used (at a high level), what each one tests, and the results of those tests. (More detailed low-level comments should be in the programs, themselves.)

Logistics

The following guidelines should help smooth the process of delivering your project. You can help us a great deal by observing the following:

Note that the project is due earlier during the day than the previous projects.
After you finish your work, please use the turnin utility to submit your work.

Usage:

turnin --submit <ta-id> cs439-labFlatFS your_files

turnin --submit <ta-id> cs439-labHierFS your_files

You may work in two-person teams on this project (or individually if you prefer.) If you work in a two-person team, turn in only one set of files per team.
Do not include object files in your submission!! (Or core dumps!!!) (e.g., run "make clean" before turnin.)
The project will be graded on the public Linux cluster (run 'cshosts publinux' to get a list) Portability should not be a major issue if you develop on a different platform. But, if you chose to develop on a different platform, porting and testing on Linux by the deadline is your responsibility. The statement "it worked on my other machine" will not be considered in the grading process in any way.
Code will be evaluated based on its correctness, clarity, and elegance. Strive for simplicity. Think before you code.

Grading

85% Code

Remember that your code must be clear and easy for a human to read. Also remember that the tests we provide are for your convenience as a starting point. You should test more thoroughly. Just passing those tests is not a guarantee that you will get a good grade.

Note: I have deliberately under-weighted part 4 relative to its conceptual difficulty and amount of code you need to write and test. As a result, if you run short on time, it is still possible to get a solid grade on this project by doing a great job on parts 2-3 and not completing part 4.
15% Documentation, testing, and analysis
Discussions of design and testing strategy and results.

Start early, we mean it!!!