CS439: Project 0
Home CS439

CS 439 Project 0
The UTCS Shell

Code due: 10:59p on Thursday, February 11, 2021
Design Doc due: 10:59p Friday, February 12, 2021

Introduction

The purpose of this assignment is to become more familiar with the concept of a shell and its purpose, with process control and the related system calls, and with programming in C, as well as working in a Unix environment and the concepts of process control. To do this, you will build a simple Unix shell. Mastering use of the shell is a crucial component of being able to work in Unix. This assignment will teach you the basics of how a shell is programmed.

There are four specific objectives in this assignment:

Overview of Shells

A command-line interpreter (CLI), more commonly known as a shell, is a program that runs other programs on the behalf of its user. A shell repeatedly prints a prompt, waits for a command line from the user, and then carries out the action requested by the command line. The shell you implement in this project will be similar to, but much simpler than, the shell you use when you log in to a typical Unix system.

A command line is a sequence of ASCII words delimited by whitespace. The first word in the command line is either the name of a built-in command or the name of an executable file. The remaining words are command-line arguments.

If the first word is a built-in command, the shell immediately executes that built-in command in the current process. Otherwise, the word is assumed to be the name of an executable program. The shell locates this program (through a process described later), creates a child process, and runs the program in the context of the child.

For example, consider the following two command lines:

firefox --safe-mode
cd /usr/local

In the first command line, the first word is an external program, so the shell searches for and executes it, with the --safe-mode argument. In the second command line, the first word is the built-in function cd, so the shell immediately executes the appropriate change without referencing external programs.

Shells also offer other convenience features. For example, the user may want to capture the output of a program in a file, for later analysis. The user may also want to be able to run several child processes at the same time. You will implement both these features in this project.

Typographical Conventions

Vocabulary/terminology that you should know will be italicized.

Things that need to be emphasized will be in bold font.

Filenames, code, and terminal output (generally, anything you might expect to see in the terminal when working on this project) will be in teletype. Usually, if it's prefaced by unix>, this is something you should type in your regular shell, while things prefaced by utcsh> should be given as input to your project.

🐚 The shell emoji will be used when we wish to point out a difference between how this project works and how most real-world shells (e.g. bash, zsh) work.

Getting Started with Your Partner

Begin by talking with your partner and agree on a remote collaboration method. If you need ideas, you can find some in the remote collaboration guide.

Set up a repository on the UTCS GitLab server in accordance with the Git Instructions

As you get started this week, please submit your answers to the questions in our Group Planning document this Friday evening. On the second Friday of the project, we will ask you to submit a Group Reflections document.

Getting Started with the Shell Project

We provide starter code (shell_project.tar.gz) that contains a template for your program, along with a number of useful helper functions. Get it from the class web page by either running the following command from the command line:

unix> wget https://www.cs.utexas.edu/~ans/classes/cs439/projects/shell_project/shell_project.tar.gz
      

or by downloading it in your browser.

Put the file shell_project.tar.gz in the protected directory (the project directory) in which you plan to do your work. Then do the following:

  1. Type the command tar xvzf shell_project.tar.gz to expand the tar archive. Once the command has finished expanding the archive, you should see the following files:
  Files:

  Makefile         # Compiles your shell program and runs the tests
  README.shell     # Used for submission

  # Files for Part 0
  fib.c            # Implement fibonacci here

  # Files for Part 1/2
  util.c           # Instructor-provided utility functions
  util.h           # The header file for util.c
  utcsh.c          # Implement your shell here
  tests            # A directory of tests for your shell
  examples         # A directory with example shell scripts in it
  1. Type the command make to ensure that your compiler can build the skeleton code.

  2. Fill out the requested information in README.shell, where applicable.

  3. Download and read over the questions in the design document.

  4. Log your time (now and every time you work!) in the Pair Programming Log.

Read this entire handout and consider the overall design of your shell before writing any code. To help you consider your design, please look over the questions in the design document. If you do not do this, you may discover that you need to rewrite major portions of your code as you progress!

Part 0: fork()/wait()

In this phase of the project, you will learn about the fork() and wait() system calls that you will use in the rest of the project.

Part 0.1: Reading

Sections 5.4 and 5.6 of OSTEP may be helpful to read before starting this project. You may also wish to consult the class resources (e.g. on C programming and shell usage) before starting.

Part 0.2: Fibonacci

Update fib.c so that if invoked on the command line with some integer argument n, where n is less than or equal to 13, it recursively computes the nth Fibonacci number. (The numbers are counted from 0.)

Example executions:

unix> fib 3
2
unix> fib 10
55

The trick is that each recursive call must be made by a new process, so you will call fork() and then have the new child process call doFib().

The parent must wait for the child to complete, and the child must pass the result of its computation to its parent.

The final result must be printed by the original process.

You may modify doFib(), but you may not modify the number of parameters it accepts or its return value. You may create helper methods, but the computation of the Fibonacci number must be done through the creation of child processes.

If your fib program is given the wrong number of arguments, it should print the usage statement and exit. It should not crash or segfault.

Finally, if your implementation of fib causes a fork bomb when we test it, you will receive 0 points for this part of the project.

Part 1: Shell Skeleton

In this part of the assignment, you will start building the basic framework of your shell. At the end of this section, you should have a basic functioning shell framework, which you will extend and upgrade in future sections.

Your basic shell will be called utcsh1.

Note: Parts 1 and 2 will walk you through a recommended implementation order for the shell. You are not required to implement everything in this order, however, you must implement all functionality in both parts to receive full credit.

Remember to read this entire document before implementing anything!

Part 1.1: Reading

Before you implement anything, you should read this entire document, as well as the design document template. Yes, that is a lot of reading. Yes, you should do it anyways (at least skim the documents!).

No additional external reading is required, though if you have not worked with command line interfaces before, you may wish to read Ubuntu's command line tutorial

You may also find it helpful to read the manpages for strtok, strcmp, and execv, though this will not be required until later.

Part 1.2: REPL

The core of any shell is the REPL, or the read-evaluate-print loop. This is a loop (surprise!) that does the following three actions repeatedly:

  1. Read input from the user (or from script the user specifies).
  2. Evaluate the input, figuring out what the user wants to do and doing it.
  3. Print any output associated with the requested action.

Implement a REPL in the main() function in utcsh.c. Print utcsh> at the start of the line, then read the user's input.

For now, your REPL should ignore most inputs. The only command it will respond to is the built-in command exit, which will cause the shell to exit by calling exit(0). That's it!

For reading lines of input, you should use getline(). Use man getline to learn more about this function. You can use the variable stdin as the argument stream in order to read input from the terminal.

Additionally, if you hit the end-of-file marker when reading input, you should also call exit(0) and return gracefully.

🐚 An interesting point is that, in C, 0 is false and nonzero is true, while in the world of UNIX exit codes, 0 indicates success and nonzero indicates failure. This turns out to be very useful, but can be a bit hard to keep track of when you're initially learning how to work with shells.

Part 1.3: Parsing and Built-in Commands

Recall from the introductory section that a command line consists of ASCII words separated by space. Implement some way to split the command line so that you can recover these words, e.g. you should be able to immediately tell that the 4th word of "path a b c d e" is "d".

We recommend that you use strtok() for this. Read the manpage for this function carefully. Careless use of this function has been known to lead to many hours of debugging.

Expand your shell's ability to process built-in commands by adding error checking to exit and implementing two new built-in commands:

Part 1.4: Handling Errors

The previous section has our first encounter with errors. For ease of implementation, your shell will only ever have one error message:

  char error_message[30] = "An error has occurred\n";
  int nbytes_written = write(STDERR_FILENO, error_message, strlen(error_message));
  if(nbytes_written != strlen(error_message)){
    exit(2);  // Should almost never happen -- if it does, error is unrecoverable
  }

Whenever an error occurs, your shell should print the error message with the above code and then continue processing. The only time your shell should exit in response to an error is described in Part 1.6. 2

It is never acceptable to crash, segfault, or otherwise break the shell in response to bad user input. Your shell must always exit gracefully, i.e. by calling exit() or returning from main().

🐚 Of course, most real world shells implement a huge variety of error messages to help the user figure out where something went wrong.

Part 1.5: Executing External Commands

If the command given is not one of the three built-in commands, it should be treated as the path to an external executable program.

For these external commands, execute the program using the fork-and-exec method discussed in class. Here are some hints to help you out:

For the child process: The child process should execute the given command by using the execv() call. You may not call system() to run a command. Remember that if execv() returns, there was an error (usually caused by incorrect arguments or the file not existing).

For the parent process: The parent should use wait() or waitpid() to wait on the child. Note that the parent does not care about what happens to the child. As long as fork() succeeds, the parent considers the process launch to have been a success.

🐚 Typical shells will collect the exit code of the child to communicate information to the programmer. For example, the exit code of the diff program can tell you not just whether two files were the same, but how they differed. For simplicity, utcsh does not worry about this.

Part 1.6: Reading A Script

Sometimes, it is very annoying to have to type in commands one at a time. One common solution for this is to create a script by putting a related sequence of commands into a file and using the shell to run that file.

Implement a script system: if utcsh is invoked with one argument, instead of reading commands from stdin, it assumes that its argument is a filename and attempts to read commands one at a time from that file instead of from stdin.

You can find example scripts in the examples/ directory. say_hello.utcsh is the most basic script and consists of a bunch of external commands There is also the more advanced say_hello_path.utcsh, which relies on the path feature (which you will implement in 2.1).

There are two other important changes when operating in script mode:

Once you have implemented this, you should be able to start running automated tests (see Part 4 for instructions on how to do this).

🐚 To show you what a script looks like for bash, we've included two bash scripts in the examples directory. One does the same thing as the say_hello scripts and can be run with bash examples/say_hello.bash. The other can be run with bash examples/file_exists.bash <filename> and will tell you whether <filename> exists, and if so, if it is a regular file or a directory.

At this point, you have a basic shell that can run both built-in and external commands, both from a script and from stdin (keyboard input)--for example, you should be able to run the say_hello.utcsh script in the examples directory.

Now might be a good time to make a git commit, if you haven't done so already!

Part 2: Advanced Shell Features

Part 2.1: Paths

When you implemented external program execution, you assumed that the 0-th argument was the path to an executable file. Unfortunately, this is really, really annoying for users, because nobody wants to type /usr/local/bin/ls every time they want to run the ls command.

The solution to this is a PATH: a set of user-specified directories to search for external programs. When the shell is given a command it does not recognize, it looks for this program in its PATH.

Note that, for the rest of this document, "path" will refer to a string with slashes in it which is used to locate a file, while PATH will be used to refer to a list of paths used to search for binary files. 3

If the program you're given is not an absolute path, i.e. a path which starts from /, you should search for your program in each directory in the PATH. For example, if your PATH is "/bin" "/usr/bin", you would search for /bin/ls and /usr/bin/ls, executing the first one you found (and returning an error if neither exists). You can check that the file exists and is executable using the functions we provide in the skeleton code. If the file does not exist, or it is not executable, this is an error.

A variable for the PATH is already provided for you in the skeleton code, called shell_paths. You can manipulate this variable directly, or by using the helper functions in util.c/util.h.

If the PATH is empty because the user executed a path command with no arguments, utcsh cannot execute any external programs unless the full path to the program is provided.

🐚 Real shells also let you specify relative paths to programs, e.g. you can type bin/myprog to run a program relative to your current working directory. You do not need to worry about this for utcsh: the program name will either be an absolute path or the name of a program to be searched for in shell_paths.

Reminder: the shell itself does not implement ls or any other program--it simply looks them up in the path and executes them.

Part 2.2: Redirection

Many times, a shell user prefers to send the output of a program to a file rather than to the screen. Usually, a shell provides this nice feature with the > character. Formally this is called redirection of output. Your shell should include this feature.

For example, if a user types ls -al /tmp > output, nothing should be printed to the screen. Instead, the standard output and standard error of the program should be rerouted to the file output.

If the output file already exists, you should overwrite and truncate it. Look through the flags in man 2 open to find out how to do this.

Here are some rules about the redirection operator:

🐚 Real shells usually allow multiple redirects and redirect stdout and stderr separately, and allow you to redirect them to each other, e.g. you can direct stdout into stderr.

Part 2.3: Concurrent Commands

Your shell will allow the user to launch concurrent commands. Remember: when two things are concurrent, they appear to execute at the same time whether they actually run simultaneously or not (logical parallelism). In UTCSH, this is accomplished with the ampersand operator:

utcsh> cmd1 & cmd2 & cmd3 args1

Instead of running cmd1, waiting for it to finish, and then running cmd2, your shell should run cmd1, cmd2, and cmd3 (with whatever args were passed) before waiting for any of them to complete.

Then, once all processes have been started, you must use wait() or waitpid() to make sure that all processes have completed before moving on.

Each individual command may optionally have its own redirection, e.g.

utcsh> cmd1 > file1 & cmd2 arg1 arg2 > file2 & cmd3 > file3

Unlike the redirection operator, the ampersand operator might not have spaces around it. For example cmd1 arg1&cmd2 > file2 is a valid command line, and requests the execution of two commands. In addition, some or all of the commands on either side of the ampersand may be blank. This means that, for example, &&&&&&& is a valid command line.

A question that remains is, "How will the shell handle concurrent built-in commands?". If a command line has multiple concurrent commands that are all built-in, the shell should execute them sequentially from left-to-right. You may assume that we will not test command lines that have both external commands and built-in commands. Your shell should not crash if this happens, but otherwise there are no requirements on what it must do.

🐚 In most bash-like shells, & is actually appended to the end of a command to instruct it to run in the background. You can search for "Bash Job Control" if you want to learn more, but don't try to use this syntax in your actual shell to run jobs in parallel, or weird things might happen!

Part 3: Hints

General Hints

Hints for Part 0

Hints for Part 1

Hints for Part 2

Line Count Hints

We are providing the rough number of lines of code used in the reference solution as a rough hint for you, so you can see how much work is needed for each function. These numbers have been rounded to the nearest multiple of 10.

Function Lines of Code
tokenize_command_line 50 lines
parse_command 60 lines
eval 60 lines
try_exec_builtin 60 lines
exec_external_cmd 30 lines
main 50 lines

Note: exec_external_cmd does a lot of work with functions provided in the skeleton code. Those functions are not included in this count. In addition, these line counts include a lot of comments and error handling code, as well as some code for dealing with arbitrarily long command lines. In other words, if you're short by 10 or 15 lines, don't worry about it!

Part 4: Checking Your Work

To help you check your work, we've provided a small test suite, along with some tools to help you run it. Note that it is not acceptable to hardcode the correct responses into the shell.

Each test in the test suite will check three things from your shell:

If any of these differ, the test suite will print an error and tell you what part of the output was wrong, along with commands you can run to see the difference.

In order to make this easier on you, we've included some helper rules in the Makefile to let you run tests easily.

No test should run for more than 10 seconds without either passing or failing. If your test runs for longer than this, you likely have an infinite loop in your code.

Part 5: On Programming and Logistics

Makefile

Your code will be tested and graded with the output of make utcsh or make (the two rules are equivalent in the provided Makefile). To aid you in debugging, two additional rules have been created in the makefile:

You may use these rules to quickly generate programs for debugging, but keep in mind that your grade will be based on the binary generated by make utcsh.

Design Document

As part of this project, you will submit a design document, where you will describe your design to us. Please note that this document is a set of questions that you will answer and is not free form. Your group will submit one design document.

General

  1. You must work in two-person teams on this project. Failure to do so will result in a 0 for the project. Once you have contacted your assigned partner, do the following:

    1. exchange first and last names, EIDs, and CS logins
    2. fill out the README.shell distributed with the project
    3. register in Canvas as a Shell Group. (Add yourselves to an empty group of your choosing. Feel free to change the name to something more creative! Keep it clean.)
    4. Create a private GitLab repo and invite your partner at at least "maintainer" level. See our git and version control guide for details on how to do this.

    You must follow the pair programming guidelines set forth for this class.

    Please see the Grading Criteria to understand how failure to follow the pair programming guidelines OR fill out the README.shell will affect your grade.

  2. You must follow the guidelines laid out in the C Style Guide or you will lose points. This includes selecting reasonable names for your files and variables.

  3. This project will be graded on the UTCS public linux machines. Although you are welcome to do testing and development on any platform you like, we cannot assist you in setting up other environments, and you must test and do final debugging on the UTCS public linux machines. The statement "It worked on my machine" will not be considered in the grading process.

  4. The execution of your solution shell will be evaluated using the test cases that are included in your project directory. To receive credit for the test cases, your shell should pass the provided test case, as determined by make clean && make utcsh && make test.

  5. Your code must compile without any additions or adjustments, or you will receive a 0 for the test cases portion of your grade.

  6. Please do not use _exit() for this assignment. exit() will do nicely.

  7. You are encouraged to not use linux.cs.utexas.edu for development. Instead, please find another option using the department's list of public UNIX hosts.

  8. You are encouraged to reuse your own code that you might have developed in previous courses to handle things such as queues, sorting, etc. You are also encouraged to use code provided by a public library such as the GNU library.

  9. You may not look at the written work of any student other than your partner. This includes, for example, looking at another student's screen to help them debug or looking at another student's print-out. See the syllabus for additional details.

  10. If you find that the problem is under specified, please make reasonable assumptions and document them in the README.shell file. Any clarifications or revisions to the assignment will be posted to Piazza.

Submitting Your Work

  1. After you finish your code, use make turnin to submit a compressed tarball named shell_project.tar.gz for submission. It may be a good idea to unpack this tarball into a clean directory on a UTCS linux system to make sure it still compiles. You should then upload the file to the Project 0 Test Cases assignment on Canvas. Make sure you have included the necessary information in the README.shell and placed your pair programming log in the project directory.

  2. Once you have completed your design document, please submit it to the Project 0 Design and Documentation assignment in Canvas. Make sure you have included your name, CS login, and UT EID in the design document.

    The purpose of the design document is to explain and defend your design to us. Its grade will reflect both your answers to the questions and the correctness and completeness of the implementation of your design. It is possible to receive partial credit for speculating on the design of portions you do not implement, but your grade will be reduced due to the lack of implementation.

Grading

Code will be evaluated based on its correctness, clarity, and elegance according to the Grading Criteria. Strive for simplicity. Think before you code.

The most important factor in grading your code design and documentation will be code inspection and evaluation of the descriptions in the write-ups. Remember, if your code does not follow the standards, it is wrong. If your code is not clear and easy to understand, it is wrong.

Footnotes

Project adapted from one used in OSTEP. Many thanks to the Drs. Arpaci-Dusseau for permission to use their work.

[1]: This is both an homage to the UTCS department and a play on the name of the popular tcsh shell.

[2]: Note that we check the return value of the write call in spite of the fact that all we can do if it's wrong is exit. This is good programming practice, and you should be sure to always check the return codes of any system or library call that you make.

[3]: Sometimes you hear PATH referred to as "the path," but in most real-world contexts, you will need to deduce which one is meant from context.