In this project, each of you will experiment with the design of a cache system for a hypothetical processor, using the data you will obtain from the dinero cache simulator. Dinero simulates a wide variety of cache configurations, which can be specified on its command line when it is run. It works by reading in a trace of the sequence of address references generated by a MIPS CPU simulator running a real program compiled using a version of gcc designed to generate MIPS code. These traces specify whether or not the memory transaction is an instruction fetch, a data read (LOAD), or a data write (STORE). For each transaction, dinero simulates the behavior of the type of cache you have specified, generating hits and misses as appropriate. At the end of the simulation, it produces a set of statistics summarizing the performance of the simulation, including the total number of transactions of each type, the percentage of misses in each case, and the total amount of memory traffic generated for each type of transaction, both in absolute terms and as a percentage of the total amount of memory traffic generated by the CPU.
Dinero is installed in /p/bin/ on the public linux machines (the "hat" machines), the public SGI O2s (the "wind" machines) and the public SUNs (the "cheese" machines). To run it you will either have to type the full pathname of the executable:
/p/bin/dinero [cache options] < tracefile
or you can add /p/bin to your PATH environment variable and then just type
dinero [cache options] < tracefile
You should also add /p/man to you MANPATH environment variable so that you can type "man dinero" to get a detailed description of the cache options you can specify and the output statistics provided.
You will not have to generate your own trace files, that has been done for you. Traces for three programs, cc1 (C compiler), spice (a circuit simulator), and tex (a document formatter) are available in compressed form. To use them, put them into a directory you have created for this project. The names are cc1.din, spice.din, and tex.din respectively if your browser uncompresses them before saving, otherwise .Z should be appended to the end of each of these names. If your files are still compressed, type "uncompress *.Z" when you are ready to use them. When you stop work, type "compress *.din" before you logout. Some of these files are 10 Meg in size (which isn't very large for trace files), but they still waste a lot of disk space if everybody keeps uncompressed copies of them in their directory. Also, when you have finished the project, please remember to delete these files (you can always retrieve them again from the web page if you need to), you don't need to tie up all this disk space needlessly for the rest of your college career.
Your assignment is to compare the behavior of fully associative and direct mapped caches in a MIPS machine that has only an L1 cache. You should assume the following.
While the hit time of the cache is always assumed to be 1 cycle, we are going to assume that the cache is the critical path in the processor, so that the clock cycle time of the processor varies with the cache organization. Since direct mapped caches are the fastest, this means that when using a fully associative cache of the same size, the processor cycle time will be nT, that is, the cycle time will increase by a factor n. Given a particular cache size, we expect a fully associative cache to outperform a direct mapped cache if we ignore the cycle time (why?). In this experiment, you will explore how one might decide which type of cache to use by determining the value of n that will make the performance of the machine using a direct mapped cache equal to the performance of a machine using a fully associative cache of the same size. You should determine n for unified caches (don't use separate I and D caches) of size 4k, 8k, 16k, 32k, 64k, 128k, 256k, 512k and 1Meg (bytes).
You will assume four different workloads for your machine. First, assume you will only use one of the three programs on the machine all the time (that gives three different workloads, one for a secretary using tex all the time, one for a circuit designer using spice all the time, and one for a programmer using cc1 all the time). Next, assume your workload consists of equal numbers of runs of each of the three programs. Determine the value of n for each workload and each cache size (yes, that will require 72 dinero runs).
Summarize your experimental procedure and results and present your conclusions in a short (3-5 page) writeup. Plot n as a function of cache size for each workload. Based on your experimental data, under what circumstances would you choose a direct mapped cache instead of a fully associative cache and why? Does your decision depend on the size of the cache and the workload? Why or why not? Be very clear about how the data support your conclusions.