CS 378: Programming for Performance

Assignment 7: GPU PROGRAMMING using CUDA

Due date: December 5th 11:59 pm

You can do this assignment alone or with someone else from class.
Each group can have a maximum of two students.
Each group should turn in one submission.


Given a directed, weighted graph and a source vertex, compute the shortest path of each vertex from the source using Bellman-Ford Algorithm. The goal of this week's assignment is to learn to use CUDA. You need to implement Single Source Shortest Path Algorithm using CUDA. You can use the graphs from the last assignment. i.e. RandomGraph and USA road network.


The main function reads a graph from a file (say, US road network) into CPU's memory, copies the graph to GPU's memory, calls the GPU kernel and waits for it to finish computing the shortest paths. The final distances are then copied back from GPU to CPU.

    main {
	   // read graph from file.
	   // allocate memory for the graph on device.
    	// copy graph from host to device.

	do {
	       changed = false;
		  sssp<<<...>>>(graph, distance, changed);
	   } while (changed);

	   // copy distance from device to host.

Number of Blocks and Number of Threads

You should experiment with the number of Blocks.
Number of threads per block can be in the range of 256-1024 (in multiples of 32).
Lonestar has 8 GPU nodes, each with two NVIDIA M2090 GPUs (Fermi).


Refer to Rupesh Nasre's paper GPU_Optimizations .
Read Section 6.1 and then the referenced subsections from Sections 3, 4, 5.
You should find Table 1 and Figure 3 useful to judge your implementation's performance.


Submit your source and a short write up. Evaluate the performance of your implementation
when you vary the number of blocks and the number of threads.
Kernel Unrolling mentioned in the paper is a must-have optimization for this assignment.
(Optional)Using Shared memory as mentioned in the paper can be considered for extra credit.(Note: If you are including this in your submission make sure you highlight it explictly in your write up)
Write up should include the conclusions and observations drawn from varying the number of blocks and the number of threads. Perform all measurements on the USA road network and the random graph.