CS 377P: Programming for Performance

Assignment 5: Shared-memory parallel programming

Due date: April 17th, 2018, 11:59 PM

You can work independently or in groups of two.
Late submission policy:
Submission can be at the most 1 day late with 10% penalty.

This assignment has two parts. In the first part, you will implement parallel programs to compute an approximation to pi using the numerical integration program discussed in class. You will implement several variations of this program to understand factors that affect performance in shared-memory programs. In the second part of the assignment, you will implement a parallel program to implement the Bellman-Ford algorithm for single-source shortest-path computation. You may use classes from the C++ STL and boost libraries if you wish. Read the entire assignment before starting work since you will be incrementally changing your code in each section of the assignment, and it will be useful to see the overall structure of what you are being asked to do.

Numerical integration to compute an estimate for pi:
            What to turn in:

            What to turn in:

              What to turn in:
             What to turn in:

std::atomic<double> pi{0.0};

void add_to_pi(double bar) {
  auto current = pi.load();
  while (!pi.compare_exchange_weak(current, current + bar));
}
            What to turn in:
           What to turn in:
Parallel Bellman-Ford implementation:

    Recall that the Bellman-Ford algorithm solves the single-source shortest path problem. It is a topology-driven algorithm, so it makes a number of sweeps over the nodes of the graph, terminating sweeps when node labels do not change in a sweep. In each sweep, it visits all the nodes of the graph, and at each node, it applies a push-style relaxation operator to update the labels of neighboring nodes.

   You can use and modify the graph construction code provided in assignment 4 for this assignment.

          amplxe-cl -collect hotspots -analyze-system -start-paused -- <command_line_for_your_SSSP_runs>

                    Are works balanced among threads? What is the percentage of work being distributed to each thread when running on rmat22? What about that when running on roadFLA?
                    You can visualize the VTune results to get the information, as shown in the following example:

VTune
        hotspots

     What to turn in:

Submission

Submit to canvas a .tar.gz file with your code for each subproblem and a report in PDF format. In the report, state both of your teammates clearly, and include all the figures and analysis. Include a Makefile for computing pi and for SSSP, respectively, so that I can compile your codes by make [PARAMETER]. Include a README.txt to explain how to compile your code, how to run your program, and what the outputs will be.

Grading

  • Numerical integration: 30 points
  • SSSP: 70 points
  • Graph formats

    Input graphs will be given to you in DIMACS format, which is described below.

    DIMACS format numbers nodes from 1, but CSR representation numbers nodes from 0. Hence, node n in DIMACS is node (n-1) in CSR. In other words,

    DIMACS format for graphs

    One popular format for representing directed graphs as text files is the DIMACS format (undirected graphs are represented as a directed graph by representing each undirected edge as two directed edges). Files are assumed to be well-formed and internally consistent so it is not necessary to do any error checking.  A line in a file must be one of the following.

    Notes: