Next: 5 Performance Up: PLAPACK: High Performance through Previous: 3 LU Factorization (with

Subsections

4 Householder QR Factorization

In this section, we discuss the computation of the QR factorization $\begin{displaymath} A = Q R\end{displaymath}$ where A is $m \times n$ , Q is $m \times m$ and R is $m \times n$ .Here $m \geq n$ , Q is unitary (

)and R has the form $\begin{displaymath} R = \left( \begin{array} {c} R_1 \\ \hline 0 \end{array}\right)\end{displaymath}$ where

is an $n \times n$ uppertriangular matrix. Partitioning $\begin{displaymath} Q = \left( \begin{array} {c \vert c} Q_1 & Q_2\end{array}\right)\end{displaymath}$ where

has width n , we see that the following also holds $\begin{displaymath} A = Q_1 R_1\end{displaymath}$ In our subsequent discussions, we will refer to both of these factorizations as a QR factorization and will explicitly indicate which of the two is being considered.

4.1 Basic algorithm

To present a basic algorithm for the QR factorization, we must start by introducing Householder transforms (reflections).

Householder transforms are orthonormal transformations that can be written as $\begin{displaymath} H = ( I + \beta v v^T )\end{displaymath}$ where $\beta = -2 \vert\vert v \vert\vert _2^2$ . These transformations, sometimes called reflectors, have a number of interesting properties: ,, and $\vert\vert H \vert\vert _2 = 1$ .Of most interest to us is the fact that given a vector $x = \left( x_1^T, x_2^T \right)^T$ , where is of length k-1 , one can find a vector such that $P_k x = \left( x_1^T , \pm \vert\vert x_2 \vert\vert _2 e_1^T \right)$ where $e_1 = \left( 1 , 0 , \cdots , 0 \right)^T$ .Indeed, it can be easily verified that $\begin{displaymath} v_k = \left( \begin{array} {c} 0 \\ \vdots \\ 0 \\ \hline x_2 \mp \vert\vert x_2 \vert\vert _2 e_1\end{array} \right)\end{displaymath}$ has this property. Notice that k here is used to indicate that the first k-1 elements of x are to be left alone, which results in having k-1 zero valued leading elements. The sign that is chosen corresponds to the sign of the first entry of to avoid catastrophic cancellation. Vector can be scaled arbitrarily by adjusting $\beta$ correspondingly. In the subsequent section, we will scale it so that the first nonzero ( k th) entry of is 1 .

To compute the QR factorization of given $m \times n$ matrix A , we wish to compute Householder transformations $H_1, \ldots , H_n$ such that $\begin{displaymath} H_n \cdots H_1 A = R\end{displaymath}$ where R is uppertriangular.

An algorithm for computing the QR factorization is given by

1.: Partition $A = \left( \begin{array} { c \vert c} a_{1} & A_{2}\end{array} \right)$
2.: Determine $(v, \beta)$ corresponding to the vector .
3.: Store v in the now zero part of .
4.: $A_2 \leftarrow (I + \beta v v^{T}) A_2 = A_2 + \beta v y^T$ where
5.: Repartition $A = \left( \begin{array} {c \vert c } \alpha_{11} & a_{12}^T \\ \hline a_{21} & A_{22}\end{array} \right)$
6.: Continue recursively with $A_{22}$

4.1.1 Blocked algorithm

In [3,9], it is shown how a blocked (matrix-matrix multiply) based algorithm can be derived , by creating a WY transform, which, when applied, yield the same result as a number of successive applications of Householder transforms: $\begin{displaymath} I + W Y^T = H_1 H_2 \cdots H_k\end{displaymath}$ where W and Y are both $n \times k$ .Given the k vectors that define the k Householder transforms, these matrices can be computed one column at a time in a relatively straight forward manner.

Given the WY transform discussion above, the blocked version of the algorithm is given in Fig. 4.

4.2 PLAPACK Implementation

A new parallel algorithm

The expert optimization of the QR factorization lies in the ability of PLAPACK of viewing parts of the matrix as a panel that can be redistributed as a multivector on the nodes. A PLAPACK implementation that explores this is given in Fig. 4.

Figure 4: Optimized implementation of QR factorization using PLAPACK

$\begin{figure} { \small \begin{tabular} { p{3in} \vert p{3in}} \begin{minipage}[... ...ticolumn{1}{c}{\normalsize (b) PLAPACK implementation}\end{tabular}}\end{figure}$

Next: 5 Performance Up: PLAPACK: High Performance through Previous: 3 LU Factorization (with

plapack@cs.utexas.edu