CS 377P: Programming for Performance

Assignment 5: Shared-memory parallel programming

Due date: November 8th, 2023, 9:00 PM

This assignment has two parts. The first part asks you to reproduce the results shown in class for the numerical integration problem. The second part asks you to implement
a parallel prefix-sum program using one of the algorithms discussed in class.


1) In the first part, you will implement parallel programs to compute an approximation to pi using the numerical integration program discussed in class. You will implement several variations of this program to understand factors that affect performance in shared-memory programs. Read the entire assignment before starting work since you will be incrementally changing your code in each section of the assignment, and it will be useful to see the overall structure of what you are being asked to do.


Numerical integration to compute an estimate for pi:

            What to turn in:

            What to turn in:

              What to turn in:

             What to turn in:


std::atomic<double> pi{0.0};

void add_to_pi(double bar) {
  auto current = pi.load();
  while (!pi.compare_exchange_weak(current, current + bar));
}


            What to turn in:

           What to turn in:

What to turn in:

Notes:

2) In the second part, you will implement a parallel prefix algorithm along the lines of the first algorithm discussed in class. A recent paper titled "A Novel Parallel Prefix Sum Algorithm and its Implementation on Multicore Platforms" by Nan Zhang gives pseudocode and implementation hints for this algorithms. Briefly, here is the algorithm.

Algorithm 1 of Zhang's paper gives a few tweaks that can improve the performance of this algorithm. Implement this algorithm and measure the running time of the algorithm for input array sizes: 100K, 500K, 1M, 2M, and for 1,2,4,8 threads. Assume the values in the arrays are doubles.

    What to turn in:

  1. A brief description of the algorithm you implemented and any optimizations you found useful.
  2. Your code.
  3. A summary of your execution time results, and a speedup chart similar to Figure 7 of the paper.