Homework Assignment 3 CS 350c Unique Number: 52140 Spring, 2017 Given: January 31, 2017 Due: February 9, 2017 This homework concerns the speed of the matrix multiplication algorithm as it is commonly implemented. Consider the six different matrix multiplication algorithms that appear on page 645 of "Computer Systems, A Programmer's Perspective" 3rd Edition (For the 2nd Edition, it's on page 626). You are to implement all six algorithms, but with a some changes. 1. Arrays A, B, and C, should all be declared as: #define ROWS (1000) #define COLUMNS (1000) #define ENTRIES (ROWS*COLUMNS) char A[ENTRIES], B[ENTRIES], C[ENTRIES]; 2. The line(s) with "sum = 0.0;" should be changed to "sum = 0;" 3. The summation should be saturating; that is, when the sum exceeds 127, the answer should be 127. When the sum is less than -128, the answer should be -128. Be sure to calculate the sums with full precision, then, finally, perform the saturation rounding. Initialize array A with a tri-diagonal set of 1s. Initialize array B with a tri-diagonal set of 1s, but then change the main diagonal only to all -1 entries. Remember, we are multiplying array A to array B, and then putting the answer in array C. Organize your arrays so that the stored elements are in row-major order. Record the running time of multiplying array A with array B with the size of the arrays varying from 100x100, 200x200, ..., 1000x1000. Thus, you will produce 60 answers. Then, calculate the time per element and make a chart like that on page 646, in the 3rd edition of "Computer Systems, a Programmer's Perspective" (a similar chart appears on page 628 in the 2nd edition). Below are the questions that you are asked to answer as a part of this assignment: 1. Is there a difference (for your matrix multiplication code) produced by the compiler for "-O2" and "-O3". If so, what is this difference? 2. How many memory references does your code make when multiplying two arrays of size N? That is, write an equation (in terms of N) that describes the number of array reads and writes required to implement one of the matrix multiplication algorithms. 3. Is the number of memory accesses the same for all six algorithms? 4. Would it make sense to modify the algorithms you implemented to use blocking? If so, why? If not, why? If unsure, how could this be determined? When you report your results, also carefully document what system you used. This work should be done on a X86-based computer. Find out what processor the machine you are using contains -- find the exact model/part number of the x86 processor you use. Once that is found, find out how much cache memory it has and also find out how many levels of cache it has.