HW2

Analyzing experimental data is an important part of operating systems research and this class. This homework is to provide a quick overview of the basics.

Read Eric Brewer's tutorial on basic statistics.

The file data1.txt contains 50 values taken from a uniform [0,1] distribution. The file data2.txt contains 50 values taken from an exponential distribution with mean 1.0. The file data3.txt contains 50 values taken from a Pareto distribution (k=.5). We will imagine that each of these files contains data samples from some measurement of interest.

1. Plot (using, for example, gnuplot) the cumulative distribution function for each of these data sets. Use a linear, semi-log, or log-log scale as you deem most appropriate for giving good intuition for each distribution's character.

2. For each of the three data files, calculate the mean, median, and standard deviation using (a) the first 3 data samples, (b) the last 3 data samples, (c) the first 10 data samples, (d) the last 10 data samples, and (e) all 50 data samples.

3. For each of the three data files, calculate the 95% confidence interval for the mean using (a) the first 3 data samples, (b) the last 3 data samples, (c) the first 10 data samples, (d) the last 10 data samples, and (e) all 50 data samples.