CS380L: Advanced Operating Systems

Emmett Witchel

Lab #1

Scheduling

The goal of this assignment is to learn about scheduling.

The first thing you will need to do is write a single process application that uses the C version of CityHash to repeatedly compute the 128-bit hash of a 4KB buffer that you initialize once by reading /dev/urandom. This hashing application is a CPU-bound process that performs a quantifiable amount of work. Figure out and report how many hashes you need to do for the application to terminate in approximately 5 seconds. If you have access to more than one hardware platform, you can compare measurements and see if the difference in the rate of hashing corresponds exactly to the difference in CPU clock rate.

Next convert your application to use multiple processes with a shared address space using the clone() system call. Make sure that no process starts hashing before all processes have been created. You should be able to change the number of processes that are created using a command line parameter.

For the purpose of the lab we have provided a CPU-bound background process API (backgroundTask.c) that you can link into your code to start and stop background processes. You should be able to change the number of background processes that are created using a command line parameter to your hashing program.

For the experiments that follow, you might need to run your tests as the root user, so that you can take advantage of sched_setaffinity and cgroups. Note that in order to get simple, repeatable results, you probably want to disable CPU frequency scaling on your system. I would do this in the BIOS, but you can also try user-level tools like this one that allow you to set the frequency directly (or perhaps the "-g performance" option would work, I'm not sure). Here is a tool. http://manpages.ubuntu.com/manpages/hardy/man1/cpufreq-selector.1.html

Tasks

Please include text about each experiment called for in this section, and include any graphs, tables or other data to make your point clearly and concisely. As always, report your experimental platform.

Use the information in /proc/cpuinfo to find out the number of physical cores that your machine has (from now on we will refer to the number of cores as N). Record data on the performance of executing N hashing processes with N background processes and then N hashing processes with N+1 background processes. (If your machine has hardware threading you might also find it interesting to try this with 2N and 2N+1 processes and see if there is a difference).

To measure performance, fix the number of hashes each thread computes and measure the time (I believe you could fix the time and measure the number of hashes, but that would be slightly more complicated). Make graphs that show throughput and fairness of the scheduling being done by the OS for the N hashing, N background, and N hashing and N+1 background (those with hyperthreaded hardware can choose 2N and 2N+1 and don't need to report both). Analyze and explain your data.

Now, pin each hashing process to a different physical core (using sched_setaffinity). Run the same experiment and comment on any changes in throughput and fairness. Include as much data as necessary to make your point. By that I mean if your results don't change at all, don't just put in another graph. But if they do change in a way that is not easily summarized in text or a number or two, then include a graph.

Next try and optimize the execution time of the hashing processes in the (N Hashing, N+1 background) scenario with respect to throughput and fairness. To increase throughput, read about cgroups and create cgroups for your hashing threads. It should be pretty clear how to use cgroups to get more CPU time for your hashing threads relative to background activities.

To optimize fairness, leave the default cgroups and think about the best way to make the work done by your hashing threads as fair as possible. Note, you do not have to maximize the throughput of your threads for this experiment (though maximizing both fairness and throughput might be interesting). To measure fairness, consider maximum deviation from the mean and standard deviation.

You will need to submit the source for your hashing application, your answers to the questions above (and below) and the data you have collected.

What is the difference between cryptographic and non-cryptographic hash functions? Please report how much time you spent on the lab.