CS 377P: Programming for Performance

Assignment 6: OpenMP programming

Due date: May 1st, 2018, 11:59 PM

You can work independently or in groups of two.
Late submission policy:
Submission can be at the most 1 day late with 10% penalty.
See the bottom for additional notes for this assignment.

In this assignment, you will use OpenMP pragmas and functions to parallelize Bellman-Ford algorithm for single-source shortest-path computation, and study the effect of loop schedule on load balance.  You may use classes from the C++ STL and boost libraries if you wish. Read the entire assignment before starting work since you will be incrementally changing your code in each section of the assignment, and it will be useful to see the overall structure of what you are being asked to do. Before you start, note the following:


Parallelize Bellman-Ford Algorithm with OpenMP:

(a) (20 points) Make one copy of your serial implementation of Bellman-Ford algorithm from assignment 5. Modify the copy so that std::atomic<int> is used to represent a node's distance, and a CAS operation is used when updating a node's distance. This is your starting point to parallelize Bellman-Ford algorithm with OpenMP, and it does not contain any pthread constructs or arrays to avoid false-sharing. Your code will look similar to the following:

  converged = false;
  start_time = ...

  /* Bellman-Ford computation over nodes */
  while(!converged) {
    for each node n in g {
      for each edge e from n {
        /* update distance(e.dst) using CAS */
      }
    }
  }

  end_time = ...
  exec_time = end_time - start_time;

(b) (20 points) Parallelize the code you derived in (a) with OpenMP. Distribute nodes among threads in a round-robin fashion with chunk size of 1, i.e. with clause schedule(static,1) associated with the loop iterating over all nodes. Measure the runtime of only Bellman-Ford computation, i.e. without graph construction, thread creation/join, initialization and printing results. To achieve this, your code should look like the following:

  #pragma omp parallel [other clauses]
  {

      start_time = ...

      /* your Bellman-Ford code from part (a) */

      end_time = ...
      exec_time = end_time - start_time;
  }

(c) (10 points) Change the schedule clause in (b) to the following parameters: (static,8), (static,32), (static,128), (static,512), (dynamic,1), (dynamic,8), (dynamic,32), (dynamic,128), and (dynamic,512). Do you see any difference in runtimes? Why is that?

    Hint: You can set the environment variable OMP_SCHEDULE to choose which loop schedule you want while having only one copy of OpenMP code. See details at OpenMP Loop Scheduling.

(d) (20 points) Distributing nodes to threads may result in load imbalance if the input graph is a power-law graph (why?). To address this issue, we can distribute edges to threads so that edges from a high-degree node can be handled by different threads. Before that, we need a version that directly iterate over edges:

          converged = false;
     start_time = ...

     /* Bellman-Ford computation over edges */
     while(!converged) {
       for each edge e in g {
         /* update distance(e.dst) using CAS */
       }
     }

     end_time = ...
     exec_time = end_time - start_time;   

(e) (30 points) Parallelize the code you derived in (d) with OpenMP. Measure the runtime with the following loop schedules over edges: (static,1), (static,8), (static,32), (static,128), (static,512), (dynamic,1), (dynamic,8), (dynamic,32), (dynamic,128), and (dynamic,512). Do you see any difference in runtimes? Why is that? Again, the measured runtime should contain only Bellman-Ford computation, i.e. without graph construction, thread creation/join, initialization and printing results. Your code will be similar to the following:

  #pragma omp parallel [other clauses]
  {
      start_time = ...

      /* your Bellman-Ford code from part (d) */

      end_time = ...
      exec_time = end_time - start_time;
  }

What to turn in:

Submission

Submit to canvas a .tar.gz file with your code for each subproblem and a report in PDF format. In the report, state both of your teammates clearly, and include all the figures and analysis. Include a Makefile so that I can compile your codes by make [PARAMETER]. Include a README.txt to explain how to compile your code, how to run your program, and what the outputs will be.

Notes