CS372 Project RFS - A Reliable File System

FS-2 (Part 2a, 2b) Due:  5:59:59 PM April 23
FS-3 (Part 3) Due:  4:59:59 PM May 4

Assignment Goals

Overview of Project

You will construct a user-level library that presents the abstraction of a reliable file system called RFS. In order to manage the complexity, you will implement this system in 4 phases, each of which presents a successively higher-level abstraction. You will be given the abstraction of a raw disk interface. In project ADisk, you constructed an atomic disk. On top of this you will build
  1. A reliable multi-level tree in which a collection of data blocks can be stored
  2. A reliable flat file system
  3. A reliable directory-based file system (RFS)

The Assignment

Make a copy of your code from project ADisk. You will begin with the Disk abstraction we provided and ADisk abstraction you constructed.
Part 0: Understand the supplied low-level disk system

Completed in project ADisk.

Part 1: Build an atomic disk

Completed in project 3.

Part 2a: Build a multi-level persistent tree
In this part of the project, you will create a persistent on-disk tree abstraction using your atomic disk abstraction. The disk will store up to MAX_TREES trees, each of which is identified by a TNum. The leaves of each tree are data blocks, and the trees grow as you add more blocks. You will make use of the ADisk to ensure that you can issue a series of updates to a tree or trees and have them occur atomically. For example, you could {update the free list to indicate that two blocks have been consumed, add a data block to one tree, add a data block to another tree causing that tree to grow the number of internal nodes it has} as a single atomic operation.

Interface: Your PTree (persistent tree) class should implement the following public methods:

Tree(boolean doFormat) throws IOException This function is the constructor. If doFormat == false, data stored in previous sessions must remain stored. If doFormat == true, the system should initialize the underlying disk to empty.

TransID beginTrans(): This function begins a new transaction and returns an identifying transaction ID.

void commitTrans(TransID xid) throws IOException, IllegalArgumentException This function commits the specified transaction. 

void abortTrans(TransID xid) throws IOException, IllegalArgumentException This function aborts the specified transaction. 

int createTree(TransID xid) throws IOException, IllegalArgumentException, ResourceException This function creates a new tree and returns the TNum number (a unique identifier for the tree). 

void deleteTree(TransID xid, int tnum) throws IOException, IllegalArgumentException This function removes the tree specified by the tree  number tnum. The tree is deleted and the corresponding resources are reclaimed. 

int getMaxDataBlockId(TransID xid, int tnum) throws IOException, IllegalArgumentException This function returns the maximum ID of any data block  stored in the specified tree. Note that blocks in a tree are numbered starting from 0. 

void readData(TransID xid, int tnum, int blockId, byte buffer[])  throws IOException, IllegalArgumentException This function reads PTree.BLOCK_SIZE_BYTES bytes from the blockId'th block of data in the tree specified by tnum into the buffer specified by buffer.  If the specified block does not exist in the tree, the function should fill *buffer with '\0' values. 

void writeData(TransID xid, int tnum, int blockId, byte buffer[]) throws IOException, IllegalArgumentException This function writes PTREE.BLOCK_SIZE_BYTES bytes from the buffer specified by buffer into the blockId'th block of data in the tree specified by tnum. If the specified block does not exist in the tree, the function should grow the tree to include the new block. Notice that this growth may require updating multiple data structures -- the free list, the pointer to the tree root, internal tree nodes, and the data block itself -- and all of these updates must be done atomically within the transaction. 

void readTreeMetadata(TransID xid, int tnum, byte buffer[])  throws IOException, IllegalArgumentException This function reads PTree.METADATA_SIZE bytes of per-tree metadata for tree tnum and stores this data in the buffer beginning at buffer. This per-tree metadata is an uninterpreted array of bytes that higher-level code may use to store state associated with a given tree. 

void  writeTreeMetadata(TransID xid, int tnum, byte buffer[]throws IOException, IllegalArgumentException This function writes PTree.METADATA_SIZE bytes of per-tree metadata for tree tnum from the buffer beginning at buffer

int getParam(int param)  throws IOException, IllegalArgumentException This function allows applications to get parameters of the persistent tree system. The parameter is one of PTree.ASK_FREE_SPACE (to ask how much free space the system currently has), PTree.ASK_MAX_TREES (to ask what is the maximum number of trees the system can support), and PTree.ASK_FREE_TREES (to ask how many free tree IDs the system currently has). It returns an integer answer to the question or throws IllegalArgumentException if param does not correspond to one of these value..

 

For all of these methods

IOException is thrown if the request is unable to complete the necessary disk accesses

IllegalArgumentException is thrown if the caller specifies a non-existant transaction or tree

ResourceException is thrown if there are not sufficient resources to complete the operation

We will provide the code PTree.java and ResourceException.java

Requirements on implementation internals

You are required to use the following basic on-disk data structures for the trees.

Simplifying assumption:

Hints: You are not required to follow this advice, but I think it might help. Feel free to ignore it if you have a better way. 

Part 2b: Flat File System

In this part of the project, you will build a "flat file system" that implements files but not directories. A flat file system allows you to read and write files that are named by inumbers rather than path names, which would not be convenient for end users, but which will form the basis for the rest of the system. You will use the persistent trees created in part 2. Each tree will store a file.

You should implement the following interface:

FlatFS(boolean doFormat) throws IOException This function is the constructor. If doFormat == false, data stored in previous sessions must remain stored. If doFormat == true, the system should initialize the underlying disk to empty.

TransID beginTrans() This function begins a new transaction and returns an identifying transaction ID.

void commitTrans(TransID xid) throws IOException, IllegalArgumentException This function commits the specified transaction. 

void abortTrans(TransID xid) throws IOException, IllegalArgumentException This function aborts the specified transaction. 

int createFile(TransID xid) throws IOException, IllegalArgumentException This function creates a new file and returns the inode number (a unique identifier for the file) 

void deleteFile(TransID xid, int inumber) IOException, IllegalArgumentException This function removes the file specified by the inode number inumber. The file is deleted and the corresponding resources are reclaimed. 

int read(TransID xid, int inumber, int offset, int count, byte buffer[]) IOException, IllegalArgumentException, EOFException This function reads count bytes from the file specified by inumber  into the buffer specified by buffer. The parameter offset specifies the starting location within the file where the data should be read. Upon success, the function returns the number of bytes read (this number can be less than count if offset + count exceeds the length of the file. The method throws EOFException if offset is past the end of the file.

void write(TransID xid, int inumber, int offset, int count, byte buffer[]) IOException, IllegalArgumentException This function writes count bytes from the buffer specified by buffer into the file specified by inumber. The parameter offset specifies the starting location within the file where the data should be written. Attempting to write beyond the end of file should extend the size of the file to accommodate the new data. 

 void readFileMetadata(TransID xid, int inumber, byte buffer[])  throws IOException, IllegalArgumentException This function reads getParam(ASK_FILE_METADATA_SIZE) bytes of per-file metadata for tree tnum and stores this data in the buffer beginning at buffer. This per-file metadata is an uninterpreted array of bytes that higher-level code may use to store state associated with a given file. 

void  writeFileMetadata(TransID xid, int inumber, char *bufferthrows IOException, IllegalArgumentException This function writes getParam(ASK_FILE_METADATA_SIZE) bytes of per-file metadata for file inumber from the buffer beginning at buffer

int getParam(int param)  throws IOException, IllegalArgumentException This function allows applications to get parameters of the file system. The parameter is one of FlatFS.ASK_MAX_FILE (to ask the maximum number of files the formatted file system supports), FlatFS.ASK_FREE_SPACE_BLOCKS (to ask how many free blocks the file system currently has), FlatFS.ASK_FREE_FILES (to ask how many free inodes the system currently has), and FlatFS.ASK_FILE_METADATA_SIZE (to ask how much space there is for per-file metadata).  It returns an integer answer to the question

We will provide the code FlatFS.java.

Hints: This layer adds very little to the persistent tree layer. Primarily, instead of reading and writing blocks, now you read and write ranges of bytes. So, read and write will need to translate requests for ranges of bytes into a series of requests for blocks. Also, notice that if offset < file length and offset + count >= file length, the read function should only read to the end of the file and not beyond it (returning a value smaller than count). So, you will need to store the file length in bytes with each tree.

Part 3: Hierarchical File System

File systems would be less useful if you needed to remember the inumber of each file you create. File systems therefore use a higher-level API with hierarchical file names to make it easy to organize and remember where data are stored.

A directory is treated in RFS like any other file, except that it can not be written to directly by user programs. The directory file consists of several entries, each describing a file or a directory. Each directory must contain at least two entries. The first one refers to the parent directory, and has the name "..", like in UNIX. The root directory's parent is the root directory itself, which is the only exception. The second mandatory entry has the name "." which points to the directory itself.

Updates to the directory structure occur only as a result of a file deletion or creation. A directory entry contains a flag showing whether the entry is used or not. You may want to include other status information in this flag according to your design. The flag is followed by the index of the inode of the file or directory corresponding to that entry. The last field in the entry is the file or directory name, which is a fixed-length string. Note that in practice, we would not want to use fixed-size arrays to store names as they would cause unacceptable inefficiency in disk access speed and space utilization, but we allow this simplification for the project.

We provide a simple template for a directory entry class, DirEnt.java. Note that this is an internal detail of your file system. You are welcome to change it.

When you format the disk, you will need to create the root directory, whose inode should always be at a known location. For example, in UNIX, the root directory is typically stored as inode 0, 1, or 2.

RFS allows users to create hierarchically-named files e.g., "/foo/bar/baz." A file name used in any of these functions is a String. No component of the name between two '/' characters or after the last '/' character can exceed FS_MAX_NAME characters in length.

Although file names are convenient for users, requiring string manipulation on each system call would increase the overhead of file access. Thus, the API allows users to open files using their names and then to read and write open files using file descriptors. A file descriptor is an integer between 0 and FS_MAX_FD that you will use as an index to an open-file-descriptor table that you will maintain. User programs use file descriptors to identify files in file system calls instead of repeatedly using file or directory names. 

Note that each operation that operates on a directory must execute atomically: either the entire operation completes or the file system is left in the state it was in before the request issued. Similarly, a set of operations that operate on an open file must all operate as a single atomic unit. Notice that unlike the previous interfaces, these interfaces do not take a transaction ID -- of these library calls internally create and commit transactions as needed. You should begin/end transactions within your code for these calls. This arrangement is more convenient for the user, but it has the disadvantage that a user cannot cause several of these calls to be executed atomically.

A description of each file system call follows:

RFS(boolean doFormat) throws IOException This function is the constructor. If doFormat == false, data stored in previous sessions must remain stored. If doFormat == true, the system should initialize the underlying file system to empty.

int createFile(String filename, boolean openIt) throws IOException, IllegalArgumentException: This function atomically creates a new file with the name filename. Filename is a full pathname (starting with "/").   If the parameter openIt is true, the function returns a file descriptor of the open file corresponding to the newly created file; in this case, the initial create(), a sequence of zero or more read() and write() calls to that file, and a final close() should all occur within a single transaction.

void createDir(String dirname) throws IOException, IllegalArgumentException This function atomically creates a directory entry with the name dirname. As before, the name is interpreted as a full pathname.  

void unlink(String name) throws IOException, IllegalArgumentException This function atomically removes the entry specified by the name. The name is interpreted as usual. If the name corresponds to a file and the file is not currently open, it is deleted and the corresponding resources are reclaimed. If name corresponds to a directory, it is deleted only if it is an empty directory. 

void rename(String oldName, String newName) throws IOException, IllegalArgumentException This function atomically changes the name of an existing file oldName into a new file newName. 

int open(String name) throws IOException, IllegalArgumentException This function performs a lookup on the file or directory whose name is specified by name. The character string specified by name must start with "/" making name a full pathname that starts from the root of the file system. The call returns a file descriptor that can be used later to refer to the file or directory specified by the search path. The function fails if name does not specify an existing file, if no file descriptors are free, or if the name corresponds to a directory. All reads and writes to the open file are part of a single transaction.

void close(int fd) throws IOException, IllegalArgumentException  This function closes the open file indicated by the file descriptor fd and commits any updates. Subsequent access to files through the fd descriptor must return an error, until the fd is reused again in an open call. Also, any resources used to support the file descriptor should be reclaimed at this point.

int read(int fd, int offset, int count, byte buffer[]) IOException, IllegalArgumentException  This function reads count bytes from the file specified by fd into the buffer specified by buffer. The parameter offset specifies the starting location within the file where the data should be read. Upon success, the function returns the number of bytes read (this number can be less than count if no more bytes are available from the position specified by offset until the end of the file).

void write(int fd, int offset, int count, byte buffer[]) IOException, IllegalArgumentException  This function writes count bytes from the buffer specified by buffer into the file specified by fd. The parameter offset specifies the starting location within the file where the data should be written. Attempting to write beyond the end of file should extend the size of the file to accommodate the new data. These writes will commit when the file is closed.

String[] readDir(String dirname) IOException, IllegalArgumentException This function atomically reads the entries that exist in the directory specified by dirname. and returns the result in an array of String objects. 

int size(int fd): IOException, IllegalArgumentException This function returns the number of bytes contained in the open file identified by fd

int space(int fd): IOException, IllegalArgumentException This function returns the number of data blocks (excluding internal nodes) consumed by the open file identified by fd.  Notice that space has to consider the existence of holes while size is not affected by holes in a file.

We will provide the skeleton code RFS.java.

Hints: Note that a char in java is two bytes. Don't forget that our persistent tree abstraction lets you stash some extra data of your choosing in a tnode.

What to Turn In

All of your implementations must adhere to (e.g., must not change) the public interfaces specified above. You may not modify the Disk interface in any way. You may add additional public methods to ADisk, PTree, or FlatFS, but we don't think you will need to do so. Although the "internal interfaces" of parts 1 and 2 would not be accessible to a "normal user" of the file system you create in part 3, we will test those internal interfaces during grading.

Electronically turn in (1) your well commented and elegant source code and (2) a file called README. Turn in the entire body of source code needed for this project (e.g., turn in your ADisk again for project F-II and your flat file system again for F-III).

Your README file should include 5 sections:

Logistics
The following guidelines should help smooth the process of delivering your project. You can help us a great deal by observing the following:
Grading

85% Code

Remember that your code must be clear and easy for a human to read. Also remember that the tests we provide are for your convenience as a starting point. You should test more thoroughly. Just passing those tests is not a guarantee that you will get a good grade.

Note: I have deliberately under-weighted part 4 relative to its conceptual difficulty and amount of code you need to write and test. As a result, if you run short on time, it is still possible to get a solid grade on this project by doing a great job on parts 2-3 and not completing part 4.

15% Documentation, testing, and analysis
Discussions of design and testing strategy and results.


Start early, we mean it!!!