|
Course Announcements
|
Homework
#1 Due Thurs 1/24, 23:59.
Homework
#2 Due FRI 2/1/2002 5:30p.m. (not Sat 2/2)
TA in charge: Mustafa Ozdal (ozdal@cs)
2 late days: (by email only) Sat 2/2 by 5:30p.m., Sun (Feb 2)
by 5:30p.m.
Hmwk 2 FAQ
-
(5/27)to clarify the questions that you are to answer for 2.9 are:
(and be sure to specify units for each answer)
a) calculate the throughput for machine M1 as specified in 2.9
b) calculate the throughput for machine M2 as specified in 2.9
c) which machine is faster for the P2 workload as specified in 2.9
d) using the 2 throughput #s for Machine M1 and M2 on Program 2, calculate
the times faster (speedup) of the faster machine (Hint: throughput is equal
to "Performance")
e) using the 2 throughput #s for Machine M1 and M2 on Program 2, calculate
the percent faster (speedup) of the faster machine
-
(5/28 3:30p.m.)NOTE: hmwk due date is now Friday 5:30 p.m. Feb 1
(see above for details.)
-
(5/28 5:30p.m.)LABELLING Confusion on slide #15, handout 3: this
slide calculates the total CPI from individual CPI's and frequencies.
The formulas and calculations are correct, but the labelling of the columns
in the example are confusing: The columns should read:
Op ------- Freqi-------CPIi------Freqi*CPI
i--------%time
Homework
#3part 1 Due TUES 2/12/2002 5:30p.m.
TA in charge: Qasim Iqbal (qasim@cs)
2 late days: Wed 2/13 by 5:30p.m., Thurs 2/14 by 5:30p.m.
Hmwk 3part1 FAQ:
-
For problem 1, you need to count only those instructions in the else block.
-
(updated Thurs 6:10p.m.) This
question is asking for you to count the # of bytes fetched from memory
by (1) counting the # of bytes for each instruction (note that this is
harder for the variable length encoding ISAs; (2) counting the # of bytes
of operand data fetched from memory (not regs); and (3) counting the #
of bytes of operand data stored into memory (not regs).
-
(updated Thurs 6p.m.) What size is the data for problem
1 ?
For problem one, assume all data operands are 32 data(including data
fetched from memory, data fetched from registers, and immediate values).
-
(updated Thurs 6p.m.) Typo: Problem 1 mentions
the call for "printf". However, in the code, "printf" is in the part before
"else"?
Think of "call towers" instead of "call printf".
-
(updated Thurs 6p.m.) What does "call _towers" mean
?
"towers" is a label and since all the code is in the same segment,
"call _towers in the Intel architecture" would translate to a "call offset"
instruction where offset is a 16 bit or 32 bit immediate value. Assume
32 bit value for this problem. WHEREAS:
"call towers" in the Sparc architecture is identical to the MIPS "jal towers".
-
(updated Thurs 6:30)What do x86 LEAVE and RET do
?
Remember that CALL saves the pointer to the next instruction following
the call(return address) onto the stack and then jumps to the starting
address of the called procedure. LEAVE, sets ESP(base pointer) to
EBP(frame pointer) (This basically releases the stack used so far), pops
off the old frame pointer (saved earlier) and saves it in EBP. RET, pops
off the return address (saved by call) and saves it in EIP.
-
(updated Thurs 6:30) Also, ignore the SPARC's "restore"
instruction.
-
For more online information on the Intel 32bit ISA, please refer to the
online book by Randall Hyde: The
Art of Assembly Language Programming. Please refer to chapter 4 for
addressing modes, and chapter 6 for a list of basic instructions.
-
(updated Thurs 6:30) For more information on JVM ISA, look here.
-
(updated Thurs 6:30) Question: For
the JVM ISA, does the instruction INVOKEVIRTUAL # (or INVOKESTATIC #) actually
fetch or store something from memory? Answer: The instruction invokestatic
#16 is intended to be a call to the towers static method. So, the 16 is
an index into the constant pool where the address of "towers" is stored.
The #16 is the "ind16", and ind16 means that the value will be encoded
in 2 bytes.
-
(updatd Thurs 11:10) On the homework Q3, do
we actually have to write the TOH procedure orjust the explicit code inside
the else clause of the C++ example. Answer: Every single instruction
in the else clause...
-
(updated Thurs 6:30) In problem 2, which 27 instructions
are integer instructions ?
In table 4.54 there are a bunch of instructions, if you exclude all
the floating point instructions, the remainder is what is needed. The FP
instructions are marked FP, and there is one more marked convert float
to int. What is expected in problem two is that, you choose the 9
most frequently used ones, and 9 more out of the 27 which you think are
important and then give a workaround for the remaining 9.
-
Jon Gibson provided the following links and comments.
Thanks to him. His comments follow.
-
NEWS: Hmwk3 has been extended by 24 hrs. That means
that you can turn in hmwk3 to TAY 4.136 (under the door) by Tues 5:30p.m.,
Wed 5:30p.m., and Thurs 5:30p.m. (is the latest).
-
Advice: there really was a wealth of info buried in handout
11, on the Java encoding....
Homework
3 part 2 (due at the same time as Homework 3 part 1)
Due TUES 2/11/2002 5:30p.m. (TAY 4.136)
TA in charge: Qasim Iqbal (qasim@cs)
2 late days: Wed 2/13 by 5:30p.m., Thurs 2/14 by 5:30p.m. (TAY
4.136)
Hmwk 3 part2 FAQ:
-
What is question 5 about?
You have to put the name of a machine in its appropriate cell (intersection
of row and column) in the table, for e.g., the machine name with 0 max.
# of operands and 0 max # of memory operand for add goes into the first
cell...
-
More on Question 5: And so you may
have several cells that are empty, and you may have several cells that
have more than one entry. This question really does not require any
web surfing, or book surfing, assuming you understand our 4 basic ISAs
that we have studied.
-
Problem 3b: just reimplement
the cmp/bne using the MIPS ISA and the SLT/SLTI, even if it's more instructions...
Your code should be logically equivalent to the cmp/bne.
-
Problem 4: yes, you only need to identify one example of each addressing
mode.
-
NEWS: Hmwk3 has been extended by 24 hrs. That means
that you can turn in hmwk3 to TAY 4.136 (under the door) by Tues 5:30p.m.,
Wed 5:30p.m., and Thurs 5:30p.m. (is the latest).
Homework
4 (due Wed 3/6 5:30p.m. TAY 4.136)
TA in charge: Mustafa Ozdal (ozdal@cs)
2 late days: Thurs 3/7 by 5:30 or Fri 3/8 by 5:30p.m. TAY 4.136)
Hmwk 4 FAQ:
-
(Mon 3/4 5:30): While you may not change anything that goes on inside
the Register File, you can obviously change anything external to the Register
File.
-
(Mon 3/4 5:30): Further explanation: the LW instruction has two activities
occurring:
First, you are calculating the sum of r1+r2.
Then, you will use that sum to be the effective address of a memory
value that you will store in rd, AND you will also need to store that sum
in r1.
Given that rd and r1 are guaranteed to NOT be the same, there is NO
restriction on your stores into r1 and rd.
Example: If r1 is initialized to the base of an array of objects
of size 4 bytes, and r2 is the constant 4, then over time, this addressing
mode will load successive elements of the array into register rd.
-
(Mon 3/4 5:30): Extra copies of the yellow worksheet are outside
my office, or you can print a single copy from the text's image: Figure
5.33pdf
-
(Mon 3/4 5:30): The RTL that we studied before Exam1 was for a different
datapath; thus, we will not need to use the word "bus" in any of our RTL,
since we don't have a shared bus in this datapath.
-
(Mon 3/4 5:30): I'd prefer that you turn the hmwk in 1 or 2 days
late, than to skip it, or not really try... this is a fun exercise, and
should cement many concepts once you have an "ah-ha"! Remember, the
hmwk is only worth 10%, so each hmwk is just 1-2%, so a day late is .1-.2%!
Homework
5 (due Mon 3/25 5:30p.m. TAY 4.136)
TA in charge: Qasim Iqbal (qasim@cs0
2 late days: Thurs 3/7 by 5:30 or Fri 3/8 by noon TAY 4.136)
Hmwk 5 FAQ:
-
You are responsible for reading all web postings through Wed 6p.m. 3/20.
-
Branch Prediction techniques are classified as static or dynamic.
Static branch prediction information does not change during the execution
of a program, while dynamic prediction may change, reflecting the time-varying
activity of the program. Static methods range from compile-time heuristics
to profile-based methods. BTBs and branch history/prediction tables, either
alone or in combination, are two examples of dynamic prediction mechanisms.
-
A branch target buffer (BTB) is a cache storing the branch address and
likely target address. When an instruction is fetched, its address is also
sent to the BTB; if there's a match in the BTB and the prediction
is "branch is taken", the next instruction is fetched using the
target address specified in the BTB.
-
Question: It's said "branch target buffer
in stage 1". Does that mean that in stage 1 we already know the target
of the branch? Isn 't that contradictory with "Target is known at the end
of the stage 3"?
-
Answer: The branch target buffer stores
the target address from the last time that this particular branch was taken.
This fact is independent of the current scenario that by the end of stage
3 (in the current execuation) the target address has been actually calculated.
Homework
6 Due Monday 4/1 NOON <------********
TA in charge: Mustafa Ozdal (ozdal@cs)
NO late days.
SOLUTIONS will be handed out at 1p.m. 4/1. So be sure to turn in your homework
before noon!
Turn into the WHITE CS homework box, TAY 2nd floor near the middle exit
doors.
Hmwk 6 FAQ:
in problem 1, the value of HLL "d" is stored in register r1
in problem 1, execute the given code to figure out whether b1 and b2 are
taken
For problem 1, yes, you must follow the assembly language. (yes,
the assembly language is correct and is identical to the C code....the
subtract is needed(why?))
for problem 2, Yes, Slide 7 bottom GANTT chart requires you to use a 2-wide
processor.
for problem 2, The term "branch penalty" is just another term for "stall
cycles because of a branch"
for problem 2, I have specified that each branch is taken, therefore,
you have the necessary information to conclude whether the predicate is
false or true.
for problem 2, when predicates are false, the instruction becomes a
"no-op" (as soon as the predicate register's value is register-forwarded
to the EX stage); remember we do not count no-ops in the IC... {however,
for a real analysis of the loop, we would have to analyze the % of false
vs the % of true predicates, so that we could calculate an average
CPI}.
THEREFORE: in the Handout 34 example (back of
page), the CPI would be 6/9 for that specific code, and if both predicates
are true, thenthe CPI is6/10.
for problem 3, row 2 of fig 6.60 has a weird solution in most student texts.
I suggest you use the row2 in my textbook (an earlier printing)
column1 is empty and column 2 has: lw $t1,
12($s1)
for problem 3, yes, you can write code with negative displacements
352 Project: design, implement and experiment
with a cache simulator.
-
INDIVIDUAL PROJECT ONLY! no teams!
-
Announcements and Hints:
-
Handout
#42: Project part1: Specification of the cache simulator, part 1.
-
Handout
#48: Project part2: Experiment, clarifications, Turn-in requirements
-
TA-in-charge: Qasim Iqbal ( qasim@cs )
-
Example input file is here: exampleinput.txt
Example output files for different configurations:
-
Example makefile and code to read command line parameters:
Readme
, Makefile
, funcs.h
, main.cc
, and funcs.cc
.
-
To use a Linux C++ debugger, see our
brief desc. of ddd
-
Questions on cache configuration: 1) How should our program handle
the absence of command line options? Answer: Use the baseline case.
2) What if only some of the options are specified but not all 3? Answer:
stop, it's an error. 3) Should we have some default values that we
should use? Answer: no need if we follow the above strategy.
4) Also do we have to do any error checking of user input or can we assume
that everything will be correct? Answer: Good question, I'm
glad to see you thinking of these issues, but no need.
-
Test Cases: Small
test cases 1 through 4
-
"Stress Test Case (1.6Mbytes of traces)" --- the compress benchmark.
I suggest that you NOT copy the stress test case, but instead use the following
full pathname of the file: /u/www/users/chris/cs352/spr2002/hmwk/project/cs352.trace
-
NOTE: "Write
allocate" (also called fetch on write): the block is loaded on a write
miss, followed by the write-hit actions. This is quite similar to
a readmiss.
NOT related to the project, but helpful info for the final:"No-write
allocate" (also called write around): The block is modified in the lower
level and not loaded into the cache.
Either write-miss policy could be used with either write-hit policy,
but typically write-back caches generally use write allocate (hoping that
subsequent writes to that block will be captured by the cache) and write-through
caches often use no-write allocate (since subsequent writes to that block
will still have to go to memory).
-
Please note that the "stress case" took about 10
seconds when we ran it. However, students
have reported the program taking just 1 minute to read every line of the
stress test. Running the program for 20 minutes just got it only half way
through!
-
Please use the following command to turnin
your files from a CS Unix machine:
/p/bin/turnin
-submit qasim cs352 files
You may specify multiple files in the parameter
"files". To check your submissions:
/p/bin/turnin
-list qasim cs352
Remember, your last
submission to the turnin directory will determine
the "timestamp"
of your submission.
----------------
Please turnin the following required items electronically:
This list is based on an EXCERPT FROM HANDOUT
#48, along with a couple of NEW comments labelled NEW:
(1) All
of your source files, header files, and makefiles (if used). These
must be well documented.
NEW:
(you are responsible for including them all)
(2) The executable file that can run on
the departmental linux
machines. The binary
file must be named "cache" and it must
accept command line arguments
in the exact form specified in Hdt 48.
(3) Five output files: The output from
simulating Test Cases 1 through 4,
and the stress test (cs352.trace
using the baseline configuration).
NEW:
Text
format only (pls do not use MS doc, pdf, postscript, etc)
(4) your report, NEW:
preferably PDF format (or html)
NEW:
(5) A Makefile if you used one.
NEW:
(6) Readme file, with
a)
instructions on how to rebuild the executable file,
b)
the name of the CS host that you ran it on,
c)
if your program is not working 100%, itemize what *is* working,
this will help us give you partial credit.
Don't forget that the hardcopy requirements are
itemized in Handout #48.
------------------
-
Question:
In the project description, the command line parameters are specified
as:
cache -cap 32 -block 16 -assoc 4
<test.trace >test.output
However, on http://www.cs.utexas.edu/users/chris/cs352/spr2002/hmwk/project/readme.html#debugging,
the parameters are parsed as follows:
cache_sim -c1024 -b32 -a2
Answer: The example
command line code was not intended to be used as is. You still need to
support the longer command line for full credit. However, it won't be worth
that much credit (approx 4%).
-
A simple way to test for EOF in C++:
..... while (cin >> tracetype) {....}.....
-
Querstion: I've tried using the redirection
that you described in Handout #48 (the yellow sheet), but my program just
sits and waits for input from the user. Do you know why this is?
I'm using "cin" to input, and "cin" is by definition reading from stdin
(refer to "C++: The Complete Reference" by Herbert Schildt, 3rd Edition,
1998, page 514). Can you perhaps clarify what mistake I may be making?
I'm using the following command line:
cache <cs352.trace >stress.output
Answer: You must not have a file named
cs352.trace in your local directory. Try it first with the sample input
files, that you copy over into your directory.
Redirection
rules!
Extra Credit (Homework 7):
-
Handout
#54
-
TA-in-charge: Mustafa Ozdal (ozdal@cs)
-
Due Friday May 3rd, 5p.m.
-
Solutions will be available outside TAY 4.136 6p.m.
-
This hmwk will be an additional 2% added to your total score.
|
|
|