[ << ] [ >> ]           [Top] [Contents] [Index] [ ? ]

5. Project 4: File Systems

In the previous two assignments, you made extensive use of a file system without actually worrying about how it was implemented underneath. For this last assignment, you will improve the implementation of the file system. You will be working primarily in the filesys directory.

You may build project 4 on top of project 2 or project 3. In either case, all of the functionality needed for project 2 must work in your filesys submission. If you build on project 3, then all of the project 3 functionality must work also, and you will need to edit filesys/Make.vars to enable VM functionality. You can receive up to 10% extra credit if you do enable VM.

5.1 Background

5.1.1 New Code

Here are some files that are probably new to you. These are in the filesys directory except where indicated:

Simple utilities for the file system that are accessible from the kernel command line.

Top-level interface to the file system. See section 3.1.2 Using the File System, for an introduction.

Translates file names to inodes. The directory data structure is stored as a file.

Manages the data structure representing the layout of a file's data on disk.

Translates file reads and writes to disk sector reads and writes.

A bitmap data structure along with routines for reading and writing the bitmap to disk files.

Our file system has a Unix-like interface, so you may also wish to read the Unix man pages for creat, open, close, read, write, lseek, and unlink. Our file system has calls that are similar, but not identical, to these. The file system translates these calls into disk operations.

All the basic functionality is there in the code above, so that the file system is usable from the start, as you've seen in the previous two projects. However, it has severe limitations which you will remove.

While most of your work will be in filesys, you should be prepared for interactions with all previous parts.

5.1.2 Testing File System Persistence

By now, you should be familiar with the basic process of running the Pintos tests. See section 1.2.1 Testing, for review, if necessary.

Until now, each test invoked Pintos just once. However, an important purpose of a file system is to ensure that data remains accessible from one boot to another. Thus, the tests that are part of the file system project invoke Pintos a second time. The second run combines all the files and directories in the file system into a single file, then copies that file out of the Pintos file system into the host (Unix) file system.

The grading scripts check the file system's correctness based on the contents of the file copied out in the second run. This means that your project will not pass any of the extended file system tests until the file system is implemented well enough to support tar, the Pintos user program that produces the file that is copied out. The tar program is fairly demanding (it requires both extensible file and subdirectory support), so this will take some work. Until then, you can ignore errors from make check regarding the extracted file system.

Incidentally, as you may have surmised, the file format used for copying out the file system contents is the standard Unix "tar" format. You can use the Unix tar program to examine them. The tar file for test t is named t.tar.

5.2 Suggested Order of Implementation

To make your job easier, we suggest implementing the parts of this project in the following order:

  1. Extensible files (see section 5.3.2 Indexed and Extensible Files). After this step, your project should pass the file growth tests.

  2. Subdirectories (see section 5.3.3 Subdirectories). Afterward, your project should pass the directory tests.

  3. Remaining miscellaneous items.

You can implement extensible files and subdirectories in parallel if you temporarily make the number of entries in new directories fixed.

You should think about synchronization throughout.

5.3 Requirements

5.3.1 Design Document

In addition to submitting your source code, each person is responsible for answering the questions in the project 4 design document template and submitting the completed file through turnin using the command

turnin --submit ccoleman fs_design fs_design.txt.

We recommend that you read the design document template before you start working on the project. See section D. Project Documentation, for a sample design document that goes along with a fictitious project.

5.3.2 Indexed and Extensible Files

The basic file system allocates files as a single extent, making it vulnerable to external fragmentation, that is, it is possible that an n-block file cannot be allocated even though n blocks are free. Eliminate this problem by modifying the on-disk inode structure. In practice, this probably means using an index structure with direct, indirect, and doubly indirect blocks. You are welcome to choose a different scheme as long as you explain the rationale for it in your design documentation, and as long as it does not suffer from external fragmentation (as does the extent-based file system we provide).

You can assume that the file system partition will not be larger than 8 MB. You must support files as large as the partition (minus metadata). Each inode is stored in one disk sector, limiting the number of block pointers that it can contain. Supporting 8 MB files will require you to implement doubly-indirect blocks.

An extent-based file can only grow if it is followed by empty space, but indexed inodes make file growth possible whenever free space is available. Implement file growth. In the basic file system, the file size is specified when the file is created. In most modern file systems, a file is initially created with size 0 and is then expanded every time a write is made off the end of the file. Your file system must allow this, but still allow files to be created with an initial size.

There should be no predetermined limit on the size of a file, except that a file cannot exceed the size of the file system (minus metadata). This also applies to the root directory file, which should now be allowed to expand beyond its initial limit of 16 files.

User programs are allowed to seek beyond the current end-of-file (EOF). The seek itself does not extend the file. Writing at a position past EOF extends the file to the position being written, and any gap between the previous EOF and the start of the write must be filled with zeros. A read starting from a position past EOF returns no bytes.

Writing far beyond EOF can cause many blocks to be entirely zero. Some file systems allocate and write real data blocks for these implicitly zeroed blocks. Other file systems do not allocate these blocks at all until they are explicitly written. The latter file systems are said to support "sparse files." You may adopt either allocation strategy in your file system.

5.3.3 Subdirectories

Implement a hierarchical name space. In the basic file system, all files live in a single directory. Modify this to allow directory entries to point to files or to other directories.

Make sure that directories can expand beyond their original size just as any other file can.

The basic file system has a 14-character limit on file names. You may retain this limit for individual file name components, or may extend it, at your option. You must allow full path names to be much longer than 14 characters.

Maintain a separate current directory for each process. At startup, set the root as the initial process's current directory. When one process starts another with the exec system call, the child process inherits its parent's current directory. After that, the two processes' current directories are independent, so that either changing its own current directory has no effect on the other. (This is why, under Unix, the cd command is a shell built-in, not an external program.)

Update the existing system calls so that, anywhere a file name is provided by the caller, an absolute or relative path name may used. The directory separator character is forward slash (/). You must also support special file names . and .., which have the same meanings as they do in Unix.

Update the open system call so that it can also open directories. Of the existing system calls, only close needs to accept a file descriptor for a directory.

Update the remove system call so that it can delete empty directories (other than the root) in addition to regular files. Directories may only be deleted if they do not contain any files or subdirectories (other than . and ..). You may decide whether to allow deletion of a directory that is open by a process or in use as a process's current working directory. If it is allowed, then attempts to open files (including . and ..) or create new files in a deleted directory must be disallowed.

Implement the following new system calls:

System Call: bool chdir (const char *dir)
Changes the current working directory of the process to dir, which may be relative or absolute. Returns true if successful, false on failure.

System Call: bool mkdir (const char *dir)
Creates the directory named dir, which may be relative or absolute. Returns true if successful, false on failure. Fails if dir already exists or if any directory name in dir, besides the last, does not already exist. That is, mkdir("/a/b/c") succeeds only if /a/b already exists and /a/b/c does not.

System Call: bool readdir (int fd, char *name)
Reads a directory entry from file descriptor fd, which must represent a directory. If successful, stores the null-terminated file name in name, which must have room for READDIR_MAX_LEN + 1 bytes, and returns true. If no entries are left in the directory, returns false.

. and .. should not be returned by readdir.

If the directory changes while it is open, then it is acceptable for some entries not to be read at all or to be read multiple times. Otherwise, each directory entry should be read once, in any order.

READDIR_MAX_LEN is defined in lib/user/syscall.h. If your file system supports longer file names than the basic file system, you should increase this value from the default of 14.

System Call: bool isdir (int fd)
Returns true if fd represents a directory, false if it represents an ordinary file.

System Call: int inumber (int fd)
Returns the inode number of the inode associated with fd, which may represent an ordinary file or a directory.

An inode number persistently identifies a file or directory. It is unique during the file's existence. In Pintos, the sector number of the inode is suitable for use as an inode number.

We have provided ls and mkdir user programs, which are straightforward once the above syscalls are implemented. We have also provided pwd, which is not so straightforward. The shell program implements cd internally. The programs are provided in the src/examples directory.

The pintos extract and append commands should now accept full path names, assuming that the directories used in the paths have already been created. This should not require any significant extra effort on your part.

5.3.4 Synchronization

The provided file system requires external synchronization, that is, callers must ensure that only one thread can be running in the file system code at once. Your submission must adopt a finer-grained synchronization strategy that does not require external synchronization. To the extent possible, operations on independent entities should be independent, so that they do not need to wait on each other.

Multiple processes must be able to access a single file at once. Multiple reads of a single file must be able to complete without waiting for one another. When writing to a file does not extend the file, multiple processes should also be able to write a single file at once. A read of a file by one process when the file is being written by another process is allowed to show that none, all, or part of the write has completed. (However, after the write system call returns to its caller, all subsequent readers must see the change.) Similarly, when two processes simultaneously write to the same part of a file, their data may be interleaved.

On the other hand, extending a file and writing data into the new section must be atomic. Suppose processes A and B both have a given file open and both are positioned at end-of-file. If A reads and B writes the file at the same time, A may read all, part, or none of what B writes. However, A may not read data other than what B writes, e.g. if B's data is all nonzero bytes, A is not allowed to see any zeros.

Operations on different directories should take place concurrently. Operations on the same directory may wait for one another.

Keep in mind that only data shared by multiple threads needs to be synchronized. In the base file system, struct file and struct dir are accessed only by a single thread.

5.3.5 General Requirements

You must work in two to four person teams on this project.
Failure to do so will result in a 0 for the project. You may work with the same partner(s) you had for the last assignment or you may change partner(s).

Once you have selected a partner, exchange first and last names, EIDs, and CS logins. Also, fill out the README distributed with the project. You must follow the pair programming guidelines set forth for this class.

Please see the Grading Criteria to understand how failure to follow the pair programming guidelines OR fill out the README will affect your grade.

You must follow the guidelines laid out in the C Style Guide or your will lose points.
This includes selecting reasonable names for your files and variables.

This project will be graded on the UTCS public linux machines.
Although you are welcome to do testing and development on any platform you like, we cannot assist you in setting up other environments, and you must test and do final debugging on the UTCS public linux machines. The statement "It worked on my machine" will not be considered in the grading process.

Your code must compile without any additions or adjustments or you will receive a 0 for the correctness portion.

You must fill out the provided README. Failure to do so will result in a loss of half the correctness grade for all members of the group.

You may not look at the written work of any student outside of your group.
This includes, for example, looking at another student's screen to help them debug, looking at another student's print-out, working with another student to sketch a high-level design on a white-board. See the syllabus for additional details.

5.3.6 Turnin Instructions

After you finish your code, please use make turnin_fs (in the src directory) to submit your code. Only one member of the group should run make turnin. Make sure you have included the necessary information in the README. Failure to do so will result in a loss of half of the correctness grade for all group members.

Once you have completed your design document (each student should complete one individually), please submit it using the following command:

turnin --submit ccoleman fs_design fs_design.txt

Make sure you have included your name, UT EID, and other requested information in the design document.

5.4 FAQ

Do I have to work with a partner or a group?


How do I submit my work?

See See section 5.3.6 Turnin Instructions.

How much code will I need to write?

Here's a summary of our reference solution, produced by the diffstat program. The final row gives total lines inserted and deleted; a changed line counts as both an insertion and a deletion.

This summary is relative to the Pintos base code, but the reference solution for project 4 is based on the reference solution to project 3. Thus, the reference solution runs with virtual memory enabled. See section 4.4 FAQ, for the summary of project 3.

The reference solution represents just one possible solution. Many other solutions are also possible and many of those differ greatly from the reference solution. Some excellent solutions may not modify all the files modified by the reference solution, and some may modify files not modified by the reference solution.

 Makefile.build       |    5 
 devices/timer.c      |   42 ++
 filesys/Make.vars    |    6 
 filesys/directory.c  |   99 ++++-
 filesys/directory.h  |    3 
 filesys/file.c       |    4 
 filesys/filesys.c    |  194 +++++++++-
 filesys/filesys.h    |    5 
 filesys/free-map.c   |   45 +-
 filesys/free-map.h   |    4 
 filesys/fsutil.c     |    8 
 filesys/inode.c      |  444 ++++++++++++++++++-----
 filesys/inode.h      |   11 
 threads/init.c       |    5 
 threads/interrupt.c  |    2 
 threads/thread.c     |   32 +
 threads/thread.h     |   38 +-
 userprog/exception.c |   12 
 userprog/pagedir.c   |   10 
 userprog/process.c   |  332 +++++++++++++----
 userprog/syscall.c   |  582 ++++++++++++++++++++++++++++++-
 userprog/syscall.h   |    1 
 vm/frame.c           |  161 ++++++++
 vm/frame.h           |   23 +
 vm/page.c            |  297 +++++++++++++++
 vm/page.h            |   50 ++
 vm/swap.c            |   85 ++++
 vm/swap.h            |   11 
 28 files changed, 2228 insertions(+), 286 deletions(-)


No, BLOCK_SECTOR_SIZE is fixed at 512. For IDE disks, this value is a fixed property of the hardware. Other disks do not necessarily have a 512-byte sector, but for simplicity Pintos only supports those that do.

5.4.1 Indexed Files FAQ

What is the largest file size that we are supposed to support?

The file system partition we create will be 8 MB or smaller. However, individual files will have to be smaller than the partition to accommodate the metadata. You'll need to consider this when deciding your inode organization.

5.4.2 Subdirectories FAQ

How should a file name like a//b be interpreted?

Multiple consecutive slashes are equivalent to a single slash, so this file name is the same as a/b.

How about a file name like /../x?

The root directory is its own parent, so it is equivalent to /x.

How should a file name that ends in / be treated?

Most Unix systems allow a slash at the end of the name for a directory, and reject other names that end in slashes. We will allow this behavior, as well as simply rejecting a name that ends in a slash.

[ << ] [ >> ]           [Top] [Contents] [Index] [ ? ]

This document was generated by Alison Norman on April 11, 2014 using texi2html