Handed out: Apr 24
Due: May 8 5PM
fork and spawn to
share their file descriptors with the child environment.
Sharing them across fork and spawn
allows the parent to set up i/o redirection before starting the
child. This enables i/o redirection of child programs without
needing to do anything special in the child.
The pingpong and primes test should still work.
Check this by runing gmake run-pingpong
and gmake run-primes.
The file system tests will not work yet.
To facilitate file descriptor sharing and multiple
types of files, the implementation of routines like
open, close, read,
and write have changed. As the first exercise in this lab,
you will adapt your solutions from lab 5 to fit in this framework.
read/write/close
file interface.
The file support we wrote in the last lab assumed that disk files were the only possible type of file. In this exercise, we will address that shortcoming, making files a generic concept, implemented by the disk file, pipe, and console devices.
The generic file descriptor code is in lib/fd.c.
You already encountered this code in lab 5,
but now you will now need to understand it in more detail
in order to implement other kinds of file descriptors.
As we outlined in lab 5,
each environment has a file descriptor table located at virtual address
FDTABLE (which happens to be 0xCFC00000).
Each page in the table represents a single file descriptor.
For example, file descriptor 2 is represented
by the page at FDTABLE+2*PGSIZE. If the file descriptor is
closed, there is no page mapped there. The file descriptor page
contains a struct Fd, declared in inc/fd.h:
struct Fd
{
u_int fd_dev_id_id; // device id
u_int fd_offset; // offset for read/write
u_int fd_omode; // open mode
};
The fd_offset and fd_omode are the current
file descriptor offset and open mode. Fd_dev_id lets us know
which device implements the read, write,
close and stat.
Each device exports a struct Dev with function pointers:
struct Dev
{
u_int dev_id;
char *dev_name;
int (*dev_read)(struct Fd*, void *buf, u_int n);
int (*dev_write)(struct Fd*, const void *buf, u_int n);
int (*dev_close)(struct Fd*);
int (*dev_stat)(struct Fd*, struct Stat*);
};
The devtab in lib/fd.c lists the known devices.
To find the device responsible for a given struct Fd,
dev_lookup looks through devtab
for a device with
dev_id == fd_dev_id_id. (The struct Fd cannot
simply contain a pointer to the appropriate struct Dev,
because we want to share them among different programs. Pointers in one
program are not likely to be valid in others.) To keep things simple, the
device ids are just characters: 'c' for console,
'p' for pipe, and 'f' for file system.
As before, each file descriptor has a 4MB region of virtual memory reserved for its own use. The file system device still uses this range to map the file. The pipe device will use it to map the pipe buffer.
To make things concrete, let's consider the implementation of
two of the generic functions: write and dup.
Write looks like this:
int
write(int fdnum, const void *buf, u_int n)
{
int r;
struct Dev *dev;
struct Fd *fd;
if ((r = fd_lookup(fdnum, &fd)) < 0
|| (r = dev_lookup(fd->fd_dev_id_id, &dev)) < 0)
return r;
if ((fd->fd_omode & O_ACCMODE) == O_RDONLY)
return -E_INVAL;
r = (*dev->dev_write)(fd, buf, n, fd->fd_offset);
if (r > 0)
fd->fd_offset += r;
return r;
}
Fd_lookup, which you implemented in lab 5,
checks whether the page corresponding to
fdnum is mapped. If not, it returns an error.
Otherwise, it stores sets fd to point at the page.
Then dev_lookup searches for the appropriate device.
Next, we check that the file is not open read-only.
Now we have the fd and the dev and can
call the dev-specific write function.
If this is successful, we update the fd_offset.
Dup looks like this:
int
dup(int oldfdnum, int newfdnum)
{
int i, r;
u_int ova, nva, pte;
struct Fd *oldfd, *newfd;
if ((r = fd_lookup(oldfdnum, &oldfd)) < 0)
return r;
close(newfdnum);
newfd = (struct Fd*)INDEX2FD(newfdnum);
sys_mem_map(0, (u_int)oldfd, 0, (u_int)newfd, vpt[VPN(oldfd)]&PTE_USER);
ova = fd2data(oldfd);
nva = fd2data(newfd);
if (vpd[PDX(ova)])
for (i=0; i<PDMAP; i+=BY2PG) {
pte = vpt[VPN(ova+i)];
if(pte&PTE_P)
sys_mem_map(0, ova+i, 0, nva+i, pte&PTE_USER);
}
return newfdnum;
}
Dup tweaks the file descriptor table so that after the call,
referring to newfdnum will
be just like referring to oldfdnum.
We use fd_lookup to find the struct Fd for the
old file descriptor.
If the file descriptor isn't valid or isn't open, we return an error.
Otherwise, we can go ahead. First, close newfdnum in case it
is already open. Then just copy the mappings for oldfdnum
into the mapping area for newfdnum.
First we copy the struct Fd page in the file descriptor table.
Then we scan the virtual address space reserved for the old descriptor,
copying any mappings into the virtual address space reserved for the new
descriptor.
If we needed to, we could add a dev_dup function pointer to
struct Dev in order to allow device implementations to run
their own code in response to dup, but for our purposes it
isn't necessary.
Make sure you understand these code snippets. You may find it useful
to consult the rest of lib/fd.c as well as
lib/console.c, an example device implementation.
Run gmake run-icode to check that file operations
and spawn still work. If you see:
init: running sh
init: starting sh
<probably some error here>
then all is well.
fork and spawn, but file descriptor state is kept
in user-space memory. Right now, on fork, the memory
will be marked copy-on-write,
so the state will be duplicated rather than shared.
(This means that running "(date; ls) >file" will
not work properly, because even though date updates its own file offset,
ls will not see the change.)
On spawn, the memory will be
left behind, not copied at all. (Effectively, the spawned environment
starts with no open file descriptors.)
We will change both fork and spawn to know that
certain regions of memory are used by the "library operating system" and
should always be shared. Rather than hard-code a list of regions somewhere,
we will set an otherwise-unused bit in the page table entries (just like
we did with the PTE_COW bit in fork).
We have defined a new PTE_SHARE bit
in inc/lib.h.
This bit is one of the three PTE bits
that are marked "available for software use"
in the Intel and AMD manuals.
We will establish the convention that
if a page table entry has this bit set,
the PTE should be copied directly from parent to child
in both fork and spawn.
Note that this is different from marking it copy-on-write:
as described in the first paragraph,
we want to make sure to share
updates to the page.
Exercise 1.
Change duppage in lib/fork.c to follow
the new convention. If the page table entry has the PTE_SHARE
bit set, just copy the mapping directly.
(Note that you should use PTE_USER, not PTE_FLAGS,
to mask out the relevant bits from the page table entry. PTE_FLAGS
picks up the accessed and dirty bits as well.)
Exercise 2.
Change spawn in lib/spawn.c to propagate
the PTE_SHARE pages. After it finishes
setting up the child virtual address space but before it marks the
child runnable, it should loop through all the page table
entries in the current process (just like fork did), copying any
mappings that have the PTE_SHARE bit set.
gmake run-testpteshare to check that your code is
behaving properly.
You should see lines that say "fork handles PTE_SHARE right"
and "spawn handles PTE_SHARE right".
Exercise 3.
Change the file server so that
all the file descriptor table pages and the file data pages get mapped
with PTE_SHARE.
gmake run-testfdsharing to check that file descriptors are shared
properly.
You should see lines that say "read in child succeeded" and
"read in parent succeeded".
User-space programs are preemptively scheduled: if an environment is the middle of something important and the clock interrupt comes along, too bad -- it has to stop and pick up again later. (This is transparent to the environment, since the kernel saves and restores its registers as though nothing had happened.)
Preemptive scheduling isn't a problem as long as all the environments are completely isolated from each other -- if they can't interfere with one another, the task switches aren't likely to be a problem.
We've seen that sometimes it's useful for different environments to "interfere constructively" with each other, in the form of IPC and shared memory pages. Unfortunately, preemptive scheduling and shared mutable state is a recipe for trouble. We'd been lucky so far, but our luck just ran out.
In the next part of this lab, you will implement a varient of producer/consumer (aka bounded buffer) called pipes that allow one environment to write into a memory buffer and another one to read from it. You will not be surprised that we will need locks and condition variables to coordinate access to this shared state. One option would be to implement locks and condition variables in the kernel, providing system calls to create a lock, acquire a lock, release a lock, create a condition variable, wait on a condition variable, signal on a condition variable, etc. We will instead follow the micro-/exo-kernel philosophy of putting most of the functionality in a user-level library and rely on the kernel for as little as possible.
We've added two new system calls:
sys_yield_unschedule() and
sys_reschedule(). The first allows an environment
to yield the CPU and indicate that it should not be rescheduled
until some other environment indicates that it would be
worthwhile to reschedule it via sys_reschedule().
For reasons that will soon become apparent,
sys_yield_unschedule() takes two arrays as
input. The first is a list of addresses and values to set those
addresses to. The second is a list of addresses to check and
expected values -- if the addresses don't match the expected
values, then sys_yield_unschedule() returns
rather than yielding.
Furthermore, an env that calls
Exercise 4. Finish implementing the kernel's
sys_yield_unschedule() and
sys_reschedule()
gmake run-testunschedule.
Exercise 4.
Use the sys_yield_unschedule() and
sys_reschedule() primitives plus a shared memory
atomic read-modify-write instruction to implement locks as
defined in lib/lock.c and condition variables as
defined in lib/cond.c
gmake run-testlock and
gmake run-testcond.
You may wish to read the
pipe manual page [1] for a description,
the V6 pipe implementation
(/usr/sys/ken/pipe.c) for details,
and
pipe section
of Dennis Ritchie's UNIX history paper [2] for interesting history.
[1] http://www.freebsd.org/cgi/man.cgi?query=pipe&manpath=Unix+Seventh+Edition [2] http://cm.bell-labs.com/cm/cs/who/dmr/hist.html#pipes
In your library operating system, a pipe is represented by a single
struct Pipe.
For sharing purposes, each struct Pipe is on its own page.
#define PIPEBUFSIZ 32
struct Pipe {
lock_t p_lock; // Mutual exclusion
cond_t p_not_full; // OK to write
cond_t p_not_empty; // OK to read
u_int p_rpos; // read position
u_int p_wpos; // write position
// data buffer -- on separate page
};
Notice that the pipe typically represents shared state between two environments, so access must be synchronized using locks and condition variables. Also notice that there might be multiple readers and writers (since file descripters are duplicated on fork/spawn.)
The bytes written to the pipe can be thought of as numbered
starting at 0. The read position gives the number of the next
byte to be read. The write position gives the number of the next
byte that will be written. The reader and writer share the pipe structure,
but coordinate via these two variables: only the reader updates p_rpos
and only the writer updates p_wpos.
Since the pipe buffer is not
infinite, byte i is stored in pipe buffer index i%PIPEBUFSIZ.
To read a byte from a pipe, the reader copies p_buf[p_rpos%PIPEBUFSIZ]
and increments p_rpos. But the pipe might be empty! First the reader
has to wait until p_rpos < p_wpos.
To write to a pipe, the writer stores into p_buf[p_wpos%PIPEBUFSIZ]
and increments p_wpos. But the pipe might be full! First the
writer must wait until p_wpos - p_rpos <
PIPEBUFSIZ.
There is a final catch -- maybe we are trying to read from an empty pipe but all writers have exited (or vice versa). Then there is no chance that there will ever be more data (or more space) in the pipe, so waiting is futile. In such a case, Unix signals end-of-file by returning 0. So will we.
To detect that there are no writers (or readers) left, we could put reader and writer counts into the pipe structure and update them every time we fork or spawn and every time an environment closes a pipe. This is fragile -- what if the environment exits without closing a pipe? Instead we can use the kernel's page reference counts, which are guaranteed to be accurate.
Recall that the kernel page structures are mapped in the user environments as
pages. The library function pageref(void *ptr)
returns the number of page table references to the page containing the virtual
address ptr. So, for example, if fd is a pointer to
a particular struct Fd, pageref(fd) will tell us how
many different references there are to that structure.
Four pages are allocated for each pipe: the struct Fd for the
reader descriptor rfd, the struct Fd
for the writer descriptor wfd,
and the struct Pipe p and p_buf shared by both.
The following equation holds: pageref(rfd) + pageref(wfd) = pageref(p_buf).
Therefore, a reader can check whether there are any writers left by computing
pageref(wfd) = pageref(p_buf) - pageref(rfd): there are no writers if
pageref(p_buf) == pageref(rfd). A writer can check for readers in the
same manner.
As an aside: you should now see why we defined
sys_yield_unschedule() to return when any environment exits.
Exercise 6.
Implement pipes in lib/pipe.c.
gmake run-testpipe,
gmake run-testpiperace, gmake
run-testpiperace2 and gmake run-primespipe.
Challenge! Although the above logic mostly ensures that a one environment doesn't get stuck if another one exits, there is still one corner case we don't handle. What if a process dies or gets killed while holding the lock or holding the guard on the lock?
When we have mulitple threads within a single process, we wouldn't worry too much about this case. If a thread dies while holding a lock, then either (a) there is a bug in the program or (b) some external force killed the process and thus all threads that could access the state.
But, when we have mulitple environments sharing state as in JOS, one might be killed, but we might want other environments to continue to make progress.
Ensure that a call to write or
read on a pipe always returns (with a normal
value if possible or with EOF if necessary).
If you give this some thought, you'll realize that this is
potentially tricky. If you think about it some more, you'll
probably realize it is tricker than you first
realized. Although the above semantics for
sys_yield_unschedule() might give environments waiting for
a lock a chance to notice that the environment holding the
lock has exited, returning the data structures to a
consistent state is not, in general, an easy thing to do
(which environment should do the "recovery"? How to handle
the corner case when the "recovery thread" also dies?
Etc. Etc.)
There are various ways to tackle this problem. One option is to avoid the use of locks and use wait-free data structures. It turns out that readers/writers as defined here is just simple enough to build without locks if you think really hard about it. Another option is to implement a more general software transactional memory abstraction instead.
bochs has been displaying output we write to
the printer port, but there is no good way to give it input.
Instead, we'll use the X11-based interface
and use CGA output and a keyboard driver. We've written the keyboard driver
for you in kern/console.c, but you need to attach it to the rest
of the system.
Exercise 7.
In your kern/trap.c, call kbd_intr to handle trap
IRQ_OFFSET+IRQ_KBD.
user/console.c.
Test your code by running gmake xrun-testkbd and
type a few lines. The system should echo your lines back to you
as you finish them. Make sure you type into the X window
bochs brings up, not the bochs console.
gmake xrun-icode. This will run your kernel inside the
X11 Bochs starting user/icode. Icode execs init,
which will set up the console as file descriptors 0 and 1 (standard input and
standard output). It will then spawn sh, the shell.
Run ls.
Exercise 8.
The shell can only run simple commands. It has no redirection or pipes.
It is your job to add these. Flesh out user/sh.c.
echo hello world | cat cat lorem >out cat out cat lorem |num cat lorem |num |num |num |num |num lsfd cat script sh <scriptNote that the user library routine
printf prints straight
to the console, without using the file descriptor code. This is great
for debugging but not great for piping into other programs.
To print output to a particular file descriptor (for example, 1, standard output),
use fprintf(1, "...", ...). See user/ls.c for examples.
Run gmake run-testshell to test your shell.
Testshell simply feeds the above commands (also found in
fs/testshell.sh) into the shell and then checks that the
output matches fs/testshell.key.
Challenge! Add more features to the shell. Some possibilities include:
ls &)
ls; echo hi)
(ls; echo hi) | cat > out)
echo $hello)
echo "a | b")
Challenge!
There is a bug in our disk file implementation
related to multiple programs writing to the same file descriptor.
Suppose they are properly sequenced to avoid simultaneous
writes (for example, running "(ls; ls; ls; ls) >file"
would be properly sequenced since there's only one writer at a time).
Even then, this is likely to cause a page fault in one of the
ls instances during a write. Identify the reason
and fix this.
gmake grade and hand it in with gmake
handin and turnin.