Assignment 3 - Sparse Matrix-Vector Multiplication using Pthreads

Generating sparse matrices

Use this file (a contribution from the class of Spring '09) to generate sparse matrices in MMEF format.

Compile the file using : gcc -O0 -o build build.c

Run it as follows : ./build < num_rows > < num_cols > < density >
Example: ./build 10 10 .1
This will give you sparse matrix containing 10 non-zero elements in a 100*100 matrix.

You can also use this link to generate sparse matrices in MMEF format.

Test matrices

Test input 1 : Matrix 1 Vector 1 Output 1
Test input 2 : Matrix 2 Vector 2 Output 2
Test input 3 : Matrix 3 Vector 3 Output 3
Test input 4 : Matrix 4 Vector 4 Output 4

The deadline to submit your performance numbers on these test inputs is Feb 19, 11:59 pm for extra credit. Submit your code and your performance measurements (in microseconds) via turnin (instructions under the submission section). I will share a google spreadsheet where everyone can see the performance numbers that the rest of the class is able to get. Please update this spreadsheet when you submit your code. You are free to keep updating your performance on this google doc as you improve your code over the weekend.

Correctness and Performance Comparison

You can check the correctness of your code and guage its performance by comparing your output with that of the Intel Math library. Here is the sparse matrix vector multiplication code using INTEL MKL (Credit to Kartik, TA Spring '10) The code is to be run as follows:


Note: Modify cspblas_dcsr.c file as you want to get timing results.

Performance Measurements

Getting good performance is one of the main aims of this assignment. There will be bonus points for the top 3 performing submissions. We will be using PAPI to measure the performance of all parallel programs. PAPI is a performance measurement tool that uses hardware counters to keep track of various performance based pararemeters. PAPI can be used as mentioned in Assignment 1. Use the PAPI_flops event to measure the flops that your program achieves. If you have any trouble working with PAPI, I can run you through an example during my office hours.

Your performance measurement should not include file I/O and matrix-format conversion. The counters should be started right before the call to pthread_create() and they should be stopped right after pthread_join(). This is to ensure a fair comparison among all submissions.

Submission Instructions

The first submission for this assignment is due on Feb 19 by 11:59pm. The final submission for this assignment is due on Feb 22 by 11:59pm.

Prepare a tar file for your submission. That tar file should have your your source code, job script for longhorn and a readme file. The readme file should have your names, your initial performance measurements in microseconds, instructions for compiling your code and the number of slip hours used. The total number of slip hours should be mentioned as "slip_hours_used:< number >" in the readme file. (I will use a script to read this, so ensure you stick to this format.)

To submit your assignment, use the following command
turnin --submit akanksha hw3_sub1 < your_tar >

Instructions for final submission
Prepare a tar file for your submission. That tar file should have your your source code, job script for longhorn, report and a readme file. The readme file should have your names, your final performance measurements on the test inputs in microseconds, instructions for compiling your code and the number of slip hours used. The total number of slip hours should be mentioned as "slip_hours_used:< number >" in the readme file. (I will use a script to read this, so ensure you stick to this format.)

To submit your assignment, use the following command
turnin --submit akanksha hw3 < your_tar >

Final Report
In your report, make sure you mention the optimizations you tried, what worked, what did not work and most importantly, your insight about why things worked/did not work. I will try to put up the best reports on my website as soon as possible.