Unit4.4.2Ahmdahl's law

Ahmdahl's law is a simple observation about the limits on parallel speedup and efficiency. Consider an operation that requires sequential time

\begin{equation*} T \end{equation*}

for computation. What if fraction $f$ of this time is {\bf not} parallelized, and the other fraction $(1-f)$ is parallelized with $t$ threads. Then the total execution time with $t$ threads, $T_t \text{,}$ obeys the inequality

\begin{equation*} T_t \geq \frac{(1-f) T }{t} + f T. \end{equation*}

This, in turn, means that the speedup is bounded by

\begin{equation*} S_t \leq \frac{T}{(1-f) T / t + f T}. \end{equation*}

Now, even if the parallelization with $t$ threads is such that the fraction that is being parallelized takes no time at all ($(1-f) T/ t = 0$), then we still find that

\begin{equation*} S_t \leq \frac{T}{(1-f) T / t + f T} \leq \frac{T}{(f T} = \frac{1}{f} \end{equation*}

What we notice is that the maximal speedup is bounded by the inverse of the fraction of the execution time that is not parallelized. So, if 20% of the code is not optimized, then no matter how many threads you use, you can at best compute the result five times faster. This means that one must think about parallelizing all parts of a code.

Ahmdahl's law says that if does not parallelize the part of a sequential code in which fraction $f$ time is spent, then the speedup attained regardless of how many threads are used is bounded by $1/f \text{.}$

The point is: So far, we have focused on parallelizing the computational part of matrix-matrix multiplication. We should also parallelize the packing, since it requires a nontrivial part of the total execution time.