CS 377P: Programming for Performance

Assignment 6: Shared-memory parallel programming

Due date: April 20th, 2017

Late submission policy: Submission can be at the most 2 days late. There will be a 10% penalty for each day after the due date (cumulative).

Clarifications
to the assignment will be posted at the bottom of the page.

This assignment has two parts. In the first part, you will implement parallel programs to compute an approximation to pi using the numerical integration program discussed in class. You will implement several variations of this program to understand factors that affect performance in shared-memory programs. In the second part of the assignment, you will implement a parallel program to implement the Bellman-Ford algorithm for single-source shortest-path computation. You may use classes from the C++ STL and boost libraries if you wish. Read the entire assignment before starting work since you will be incrementally changing your code in each section of the assignment, and it will be useful to see the overall structure of what you are being asked to do.

Coding

 Numerical integration to compute an estimate for pi:
            What to turn in:
              What to turn in:
             What to turn in:

std::atomic<double> pi{0};

void add_to_pi(double bar) {
  auto current = pi.load();
  while (!pi.compare_exchange_weak(current, current + bar));
}
            What to turn in:
           What to turn in:

Parallel Bellman-Ford implementation:

    Recall that the Bellman-Ford algorithm solves the single-source shortest path problem. It is a topology-driven algorithm, so it makes a number of sweeps over the nodes of the graph, terminating sweeps when node labels do not change in a sweep. In each sweep, it visits all the nodes of the graph, and at each node, it applies a push-style relaxation operator to update the labels of neighboring nodes.
     One way to parallelize Bellman-Ford is to create some number of threads (say t), and divide the nodes more or less  equally between threads in blocks of (N/t) where N is the number of nodes in the graph. In each sweep, a thread applies the operator to all the nodes assigned to it. You can also assign nodes to threads in a round-robin way. Giving all threads roughly equal numbers of nodes may not give you good load-balance for power-law graphs (why?) but we will live with it. Feel free to invent more load-balanced ways of assigning nodes to threads.
    The main concurrency correctness issue you need to worry about is ensuring that updates to node labels are done atomically. The lecture slides show how you can use a CAS operation to accomplish this. Read the C++ documentation to see to implement this in C++.
     Input graphs: use rmat15, rmat23, road-FLA and road-NY. 
     Source nodes: node 1 for rmat graphs, node 140961 for road-NY, node 316607 for road-FL. These are the nodes with the highest degree.
    
The output for the SSSP algorithm should be produced as a text file containing one line for each node, specifying the number of the node and the label of that node.  You can check your sssp solution by comparing your results with the ones you found in assignment 3.

     What to turn in:

Submission

Submit (in canvas) your code and all the items listed in the experiments above.

Grading

  • Numerical integration: 30 points
  • SSSP: 70 points
  • Graph formats

    Input graphs will be given to you in DIMACS format, which is described below.


    DIMACS format for graphs

    One popular format for representing directed graphs as text files is the DIMACS format (undirected graphs are represented as a directed graph by representing each undirected edge as two directed edges). Files are assumed to be well-formed and internally consistent so it is not necessary to do any error checking.  A line in a file must be one of the following.

    Notes: