CS352 Computer Systems Architecture


Course Announcements


 

  Hmwk #1
Hmwk #2
 Hmwk #3 part1
 Hmwk #3 part2
 Hmwk #4
Hmwk #5
 Hmwk #6
Project
 Extra credit
 


Homework #1   Due  Thurs 1/24, 23:59.

Homework #2 Due FRI 2/1/2002  5:30p.m. (not Sat 2/2) 
TA in charge:  Mustafa Ozdal (ozdal@cs) 
2 late days:  (by email only) Sat 2/2 by 5:30p.m., Sun (Feb 2) by 5:30p.m. 
Hmwk 2 FAQ
  1. (5/27)to clarify the questions that you are to answer for 2.9 are:  (and be sure to specify units for each answer)

  2. a) calculate the throughput for machine M1 as specified in 2.9 
    b) calculate the throughput for machine M2 as specified in 2.9 
    c) which machine is faster for the P2 workload as specified in 2.9 
    d) using the 2 throughput #s for Machine M1 and M2 on Program 2, calculate the times faster (speedup) of the faster machine (Hint: throughput is equal to "Performance") 
    e) using the 2 throughput #s for Machine M1 and M2 on Program 2, calculate the percent faster (speedup) of the faster machine 
  3. (5/28 3:30p.m.)NOTE:  hmwk due date is now Friday 5:30 p.m. Feb 1 

  4. (see above for details.) 
  5. (5/28 5:30p.m.)LABELLING Confusion on slide #15, handout 3:  this slide calculates the total CPI from individual CPI's and frequencies.  The formulas and calculations are correct, but the labelling of the columns in the example are confusing:  The columns should read:

  6. Op ------- Freqi-------CPIi------Freqi*CPI i--------%time 


Homework #3part 1 Due TUES 2/12/2002  5:30p.m. 
TA in charge:  Qasim Iqbal (qasim@cs) 
2 late days:  Wed 2/13 by 5:30p.m., Thurs 2/14 by 5:30p.m. 
Hmwk 3part1  FAQ:
  • For problem 1, you need to count only those instructions in the else block. 
  • (updated Thurs 6:10p.m.)  This question is asking for you to count the # of bytes fetched from memory by (1) counting the # of bytes for each instruction (note that this is harder for the variable length encoding ISAs; (2) counting the # of bytes of operand data fetched from memory (not regs); and (3) counting the # of bytes of operand data stored into memory (not regs).
  • (updated Thurs 6p.m.) What size is the data for problem 1 ?

  • For problem one, assume all data operands are 32 data(including data fetched from memory, data fetched from registers, and immediate values). 
  • (updated Thurs 6p.m.) Typo: Problem 1 mentions the call for "printf". However, in the code, "printf" is in the part before "else"?

  • Think of "call towers" instead of "call printf". 
  • (updated Thurs 6p.m.) What does "call _towers" mean ?

  • "towers" is a label and since all the code is in the same segment, "call _towers in the Intel architecture" would translate to a "call offset" instruction where offset is a 16 bit or 32 bit immediate value. Assume 32 bit value for this problem.    WHEREAS: "call towers" in the Sparc architecture is identical to the MIPS "jal towers". 
  • (updated Thurs 6:30)What do x86 LEAVE and RET do ? 

  • Remember that CALL saves the pointer to the next instruction following the call(return address) onto the stack and then jumps to the starting address of the called procedure.  LEAVE, sets ESP(base pointer) to EBP(frame pointer) (This basically releases the stack used so far), pops off the old frame pointer (saved earlier) and saves it in EBP. RET, pops off the return address (saved by call) and saves it in EIP. 
     
  • (updated Thurs 6:30) Also, ignore the SPARC's "restore" instruction.

  •  
  • For more online information on the Intel 32bit ISA, please refer to the online book by Randall Hyde: The Art of Assembly Language Programming. Please refer to chapter 4 for addressing modes, and chapter 6 for a list of basic instructions.

  •  
  • (updated Thurs 6:30) For more information on JVM ISA, look here. 

  •  
  • (updated Thurs 6:30) Question: For the JVM ISA, does the instruction INVOKEVIRTUAL # (or INVOKESTATIC #) actually fetch or store something from memory? Answer: The instruction invokestatic #16 is intended to be a call to the towers static method. So, the 16 is an index into the constant pool where the address of "towers" is stored.  The #16 is the "ind16", and ind16 means that the value will be encoded in 2 bytes. 
  • (updatd Thurs 11:10) On the homework Q3, do we actually have to write the TOH procedure orjust the explicit code inside the else clause of the C++ example.  Answer: Every single instruction in the else clause...

  • (updated Thurs 6:30) In problem 2, which 27 instructions are integer instructions ?

  • In table 4.54 there are a bunch of instructions, if you exclude all the floating point instructions, the remainder is what is needed. The FP instructions are marked FP, and there is one more marked convert float to int.  What is expected in problem two is that, you choose the 9 most frequently used ones, and 9 more out of the 27 which you think are important and then give a workaround for the remaining 9. 

  • Jon Gibson provided the following links and comments. Thanks to him. His comments follow.

  • NEWS:  Hmwk3 has been extended by 24 hrs.  That means that you can turn in hmwk3 to TAY 4.136 (under the door) by Tues 5:30p.m., Wed 5:30p.m., and Thurs 5:30p.m. (is the latest).
  • Advice:  there really was a wealth of info buried in handout 11, on the Java encoding.... 


Homework 3 part 2 (due at the same time as Homework 3 part 1) 
Due TUES 2/11/2002  5:30p.m. (TAY 4.136) 
TA in charge:  Qasim Iqbal (qasim@cs) 
2 late days:  Wed 2/13 by 5:30p.m., Thurs 2/14 by 5:30p.m. (TAY 4.136) 
Hmwk 3 part2 FAQ:
  • What is question 5 about?

  • You have to put the name of a machine in its appropriate cell (intersection of row and column) in the table, for e.g., the machine name with 0 max. # of operands and 0 max # of memory operand for add goes into the first cell... 
  • More on Question 5: And so you may have several cells that are empty, and you may have several cells that have more than one entry.  This question really does not require any web surfing, or book surfing, assuming you understand our 4 basic ISAs that we have studied.
  • Problem 3b:  just reimplement the cmp/bne using the MIPS ISA and the SLT/SLTI, even if it's more instructions...  Your code should be logically equivalent to the cmp/bne.
  • Problem 4: yes, you only need to identify one example of each addressing mode.
  • NEWS:  Hmwk3 has been extended by 24 hrs.  That means that you can turn in hmwk3 to TAY 4.136 (under the door) by Tues 5:30p.m., Wed 5:30p.m., and Thurs 5:30p.m. (is the latest).


Homework 4 (due Wed 3/6 5:30p.m. TAY 4.136) 
TA in charge: Mustafa Ozdal (ozdal@cs) 
2 late days: Thurs 3/7 by 5:30 or Fri 3/8 by 5:30p.m. TAY 4.136) 
Hmwk 4 FAQ:
  • (Mon 3/4 5:30):  While you may not change anything that goes on inside the Register File, you can obviously change anything external to the Register File.
  • (Mon 3/4 5:30):  Further explanation: the LW instruction has two activities occurring:

  • First, you are calculating the sum of r1+r2. 
    Then, you will use that sum to be the effective address of a memory value that you will store in rd, AND you will also need to store that sum in r1. 
    Given that rd and r1 are guaranteed to NOT be the same, there is NO restriction on your stores into r1 and rd. 
    Example:  If r1 is initialized to the base of an array of objects of size 4 bytes, and r2 is the constant 4, then over time, this addressing mode will load successive elements of the array into register rd. 
  • (Mon 3/4 5:30):  Extra copies of the yellow worksheet are outside my office, or you can print a single copy from the text's image: Figure 5.33pdf 
  • (Mon 3/4 5:30):  The RTL that we studied before Exam1 was for a different datapath; thus, we will not need to use the word "bus" in any of our RTL, since we don't have a shared bus in this datapath.
  • (Mon 3/4 5:30):  I'd prefer that you turn the hmwk in 1 or 2 days late, than to skip it, or not really try... this is a fun exercise, and should cement many concepts once you have an "ah-ha"!  Remember, the hmwk is only worth 10%, so each hmwk is just 1-2%, so a day late is .1-.2%!


Homework 5 (due Mon 3/25 5:30p.m. TAY 4.136) 
TA in charge: Qasim Iqbal (qasim@cs0 
2 late days: Thurs 3/7 by 5:30 or Fri 3/8 by noon TAY 4.136) 
Hmwk 5 FAQ:
  • You are responsible for reading all web postings through Wed 6p.m. 3/20.
  • Branch Prediction techniques are classified as static or dynamic. Static branch prediction information does not change during the execution of a program, while dynamic prediction may change, reflecting the time-varying activity of the program. Static methods range from compile-time heuristics  to profile-based methods. BTBs and branch history/prediction tables, either alone or in combination, are two examples of dynamic prediction mechanisms. 
  • A branch target buffer (BTB) is a cache storing the branch address and likely target address. When an instruction is fetched, its address is also sent to the BTB; if there's a match in the BTB and the prediction is "branch is taken", the next instruction is fetched using the target address specified in the BTB. 
  • Question: It's said "branch target buffer in stage 1". Does that mean that in stage 1 we already know the target of the branch? Isn 't that contradictory with "Target is known at the end of the stage 3"?
  • Answer: The branch target  buffer stores the target address from the last time that this particular branch was taken. This fact is independent of the current scenario that by the end of stage 3 (in the current execuation) the target address has been actually calculated.




Homework 6 Due Monday 4/1 NOON <------******** 
TA in charge: Mustafa Ozdal (ozdal@cs) 
  • NO late days. 
  • SOLUTIONS will be handed out at 1p.m. 4/1. So be sure to turn in your homework before noon!
  • Turn into the WHITE CS homework box, TAY 2nd floor near the middle exit doors.
  • Hmwk 6 FAQ:
  • in problem 1, the value of HLL "d" is stored in register r1
  • in problem 1, execute the given code to figure out whether b1 and b2 are taken
  • For problem 1, yes, you must follow the assembly language.  (yes, the assembly language is correct and is identical to the C code....the subtract is needed(why?))
  • for problem 2, Yes, Slide 7 bottom GANTT chart requires you to use a 2-wide processor.
  • for problem 2, The term "branch penalty" is just another term for "stall cycles because of a branch"
  • for problem 2, I have  specified that each branch is taken, therefore, you have the necessary information to conclude whether the predicate is false or true.
  • for problem 2, when predicates are false, the instruction becomes a "no-op" (as soon as the predicate register's value is register-forwarded to the EX stage); remember we do not count no-ops in the IC... {however, for a real analysis of the loop, we would have to analyze the % of false vs the % of true predicates, so that we could calculate an average CPI}.
  • THEREFORE: in the Handout 34 example (back of page), the CPI would be 6/9 for that specific code, and if both predicates are true, thenthe CPI  is6/10.
  • for problem 3, row 2 of fig 6.60 has a weird solution in most student texts.

  • I suggest you use the row2 in my textbook (an earlier printing) 
    column1 is empty    and column 2 has:  lw $t1, 12($s1) 
  • for problem 3, yes, you can write code with negative displacements


  • 352 Project: design, implement and experiment with a cache simulator.
    • INDIVIDUAL PROJECT ONLY! no teams!
    • Announcements and Hints:
    • Handout #42: Project part1: Specification of the cache simulator, part 1.
    • Handout #48: Project part2: Experiment, clarifications, Turn-in requirements
    • TA-in-charge: Qasim Iqbal ( qasim@cs )
    • Example input file is here: exampleinput.txt

    • Example output files for different configurations:
    • Example makefile and code to read command line parameters:

    • Readme , Makefile , funcs.h , main.cc , and funcs.cc
    • To use a Linux C++ debugger, see our brief desc. of ddd 
    • Questions on cache configuration: 1) How should our program handle the absence of command line options? Answer: Use the baseline case.   2) What if only some of the options are specified but not all 3? Answer:  stop, it's an error.  3) Should we have some default values that we should use?  Answer: no need if we follow the above strategy.  4) Also do we have to do any error checking of user input or can we assume that everything will be correct?  Answer:  Good question, I'm glad to see you thinking of these issues, but no need.
    • Test Cases:  Small test cases 1 through 4 
    • "Stress Test Case (1.6Mbytes of traces)" --- the compress benchmark. I suggest that you NOT copy the stress test case, but instead use the following full pathname of the file:  /u/www/users/chris/cs352/spr2002/hmwk/project/cs352.trace 
    • NOTE: "Write allocate" (also called fetch on write): the block is loaded on a write miss, followed by the write-hit actions.  This is quite similar to a readmiss.

    • NOT related to the project, but helpful info for the final:"No-write allocate" (also called write around): The block is modified in the lower level and not loaded into the cache.
      Either write-miss policy could be used with either write-hit policy, but typically write-back caches generally use write allocate (hoping that subsequent writes to that block will be captured by the cache) and write-through caches often use no-write allocate (since subsequent writes to that block will still have to go to memory). 
       

    • Please note that the "stress case" took about 10 seconds when we ran it. However, students have reported the program taking just 1 minute to read every line of the stress test. Running the program for 20 minutes just got it only half way through!
    • Please use the following command to turnin your files from a CS Unix machine:
             /p/bin/turnin -submit qasim cs352 files

      You may specify multiple files in the parameter "files". To check your submissions:

              /p/bin/turnin -list qasim cs352

      Remember, your last submission to the turnin directory will determine the "timestamp" of your submission.

      ----------------
      Please turnin the following required items electronically:
      This list is based on an EXCERPT FROM HANDOUT #48, along with a couple of NEW comments labelled NEW:

        (1) All of your source files, header files, and makefiles (if used).  These must be well documented.
          NEW: (you are responsible for including them all)

        (2) The executable file that can run on the departmental linux
          machines.  The binary file must be named "cache" and it must
          accept command line arguments in the exact form specified in Hdt 48. 

        (3) Five output files: The output from simulating Test Cases 1 through 4,
          and the stress test (cs352.trace using the baseline configuration).
          NEW: Text format only (pls do not use MS doc, pdf, postscript, etc)

        (4) your report, NEW: preferably PDF format (or html)

      NEW:
        (5) A Makefile if you used one.

      NEW
        (6) Readme file, with 
              a) instructions on how to rebuild the executable file,
              b) the name of the CS host that you ran it on,
              c) if your program is not working 100%, itemize what *is* working,
                      this will help us give you partial credit.

      Don't forget that the hardcopy requirements are itemized in Handout #48.

      ------------------
       

    •  Question: In the project description, the command line parameters  are specified as:
    • cache -cap 32 -block 16 -assoc 4  <test.trace  >test.output
      However, on http://www.cs.utexas.edu/users/chris/cs352/spr2002/hmwk/project/readme.html#debugging, the parameters are parsed as follows:
      cache_sim -c1024 -b32 -a2
        Answer: The example command line code was not intended to be used as is. You still need to support the longer command line for full credit. However, it won't be worth that much credit (approx 4%).
         
    • A simple way to test for EOF in C++: 

    • ..... while (cin >> tracetype) {....}.....
       
    • Querstion: I've tried using the redirection that you described in Handout #48 (the yellow sheet), but my program just sits and waits for input from the user. Do you know why this is?  I'm using "cin" to input, and "cin" is by definition reading from stdin (refer to "C++:  The Complete Reference" by Herbert Schildt, 3rd Edition, 1998, page 514).  Can you perhaps clarify what mistake I may be making?  I'm using the following command line:

    •       cache <cs352.trace >stress.output

      Answer: You must not have a file named cs352.trace in your local directory. Try it first with the sample input files, that you copy over into your directory.

              Redirection rules!
       
       


    Extra Credit (Homework 7):
    • Handout #54
    • TA-in-charge: Mustafa Ozdal (ozdal@cs)
    • Due Friday May 3rd, 5p.m.
    • Solutions will be available outside TAY 4.136 6p.m.
    • This hmwk will be an additional 2% added to your total score.