## Unit4.4.1Speedup, efficiency, etc.

### Subsubsection4.4.1.1Sequential time

We denote the sequential time for an operation by

\begin{equation*} T( n ), \end{equation*}

where $n$ is a parameter that captures the size of the problem being solved. For example, if we only time problems with square matrices, then $n$ would be the size of the matrices. $T(n)$ is equal the time required to compute the operation on a single core. If we consider all three sizes of the matrices, then $n$ might be a vector with three entries.

A question is "Which sequential time?" Should we use the time by a specific implementation? Should we use the time for the fastest implementation? That's a good question! Usually it is obvious which to use from context.

### Subsubsection4.4.1.2Parallel time

The parallel time for an operation and a given implementation using $t$ threads is given by

\begin{equation*} T_t( n ). \end{equation*}

This allows us to define the sequential time for a specific implementation by $T_1(n)\text{.}$ Itis the sequential time of the parallel implementation when using only one thread.

### Subsubsection4.4.1.3Speedup

The speedup for an operation and a given implementation using $t$ threads is given by

\begin{equation*} S_t(n) = T( n ) / T_t( n ). \end{equation*}

It measures how much speedup is attained by using $t$ threads.

Often, the ratio

\begin{equation*} T_1(n) / T_t(n) \end{equation*}

is used to report speedup (as we did in Unit 4.1.1). We now understand that this may give an optimistic result: If the sequential execution of the given implementation yields poor performance, it is still possible to attain very good speedup. The fair thing is to use $T(n)$ of the best implementation when computing speedup.

###### Remark4.4.1.

Intuitively, it would seem that

\begin{equation*} S_t( n ) = T( n ) / T_t( n ) \leq t . \end{equation*}

when you have at least $t$ cores. In other words, that speedup is less than or equal to the number of threads/cores at your disposal. After all, it would seem that $T(n) \le t T_t(n)$ since one could create a sequential implementation from the parallel implementation by mapping all computations performed by the threads to one thread, and this would remove the overhead of coordinating between the threads.

Let's use a simple example to show that speedup can be greater than $t\text{:}$ Generally speaking, when you make a bed with two people, it takes less than half the time than when you do it by yourself. Why? Because usually two people have two sets of hands and the "workers" can therefore be on both sides of the bed simultaneously. Overhead incurrred when you make the bed by yourself (e.g., walking from one side to the other) is avoided. The same is true when using $t$ threads on $t$ cores: The task to be completed now has $t$ sets of registers, $t$ L1 caches, etc.

Attaining better than a factor $t$ speedup with $t$ threads and cores is called super-linear speedup. It should be viewed as the exception rather than the rule.

### Subsubsection4.4.1.4Efficiency

The efficiency for an operation and a given implementation using $t$ threads is given by

\begin{equation*} E_t(n) = S_t(n) / t . \end{equation*}

It measures how efficiently the $t$ threads are being used.