next up previous
Next: 5 Performance Up: PLAPACK: High Performance through Previous: 3 LU Factorization (with


4 Householder QR Factorization

  In this section, we discuss the computation of the QR factorization \begin{displaymath}
A = Q R\end{displaymath}where A is $ m \times n $, Q is $ m \times m $and R is $ m \times n $.Here $ m \geq n $, Q is unitary ($
Q Q^T = Q^T Q = I
$)and R has the form \begin{displaymath}
R = \left( \begin{array}
R_1 \\  \hline
0 \end{array}\right)\end{displaymath}where $ R_1 $ is an $ n \times n $ uppertriangular matrix. Partitioning \begin{displaymath}
Q = \left( \begin{array}
{c \vert c}
Q_1 & Q_2\end{array}\right)\end{displaymath}where $ Q_1 $ has width n , we see that the following also holds \begin{displaymath}
A = Q_1 R_1\end{displaymath}In our subsequent discussions, we will refer to both of these factorizations as a QR factorization and will explicitly indicate which of the two is being considered.

4.1 Basic algorithm

To present a basic algorithm for the QR factorization, we must start by introducing Householder transforms (reflections).

Householder transforms are orthonormal transformations that can be written as \begin{displaymath}
H = ( I + \beta v v^T )\end{displaymath}where $ \beta = -2 \vert\vert v \vert\vert _2^2 $. These transformations, sometimes called reflectors, have a number of interesting properties: $ H^T = H $,$ H^T H = H H^T = H H = I $, and $ \vert\vert H \vert\vert _2 = 1 $.Of most interest to us is the fact that given a vector $ x = \left( x_1^T, x_2^T \right)^T $, where $ x_1 $ is of length k-1 , one can find a vector $ v_k$ such that $ P_k x = \left( x_1^T , \pm \vert\vert x_2 \vert\vert _2 e_1^T \right) $ where $ e_1 = \left( 1 , 0 , \cdots , 0 \right)^T $.Indeed, it can be easily verified that \begin{displaymath}
v_k = \left( \begin{array}
0 \\ \vdots \\ 0 \\  \hline
x_2 \mp \vert\vert x_2 \vert\vert _2 e_1\end{array} \right)\end{displaymath}has this property. Notice that k here is used to indicate that the first k-1 elements of x are to be left alone, which results in $ v_k $ having k-1 zero valued leading elements. The sign that is chosen corresponds to the sign of the first entry of $ x_2 $ to avoid catastrophic cancellation. Vector $ v_k $ can be scaled arbitrarily by adjusting $ \beta $ correspondingly. In the subsequent section, we will scale it so that the first nonzero ( k th) entry of $ v_k $ is 1 .

To compute the QR factorization of given $ m \times n $ matrix A , we wish to compute Householder transformations $ H_1, \ldots , H_n $ such that \begin{displaymath}
H_n \cdots H_1 A = R\end{displaymath}where R is uppertriangular.

An algorithm for computing the QR factorization is given by

Partition $ A =
\left( \begin{array}
{ c \vert c}
a_{1} & A_{2}\end{array} \right) $
Determine $(v, \beta)$ corresponding to the vector $a_1$.
Store v in the now zero part of $a_1$.
$ A_2 \leftarrow (I + \beta v v^{T}) A_2 
= A_2 + \beta v y^T $ where $ y^T = v^T A_2 $
Repartition $A = \left( \begin{array}
{c \vert c }
\alpha_{11} & a_{12}^T \\  \hline
a_{21} & A_{22}\end{array} \right) $
Continue recursively with $ A_{22} $

4.1.1 Blocked algorithm

In [3,9], it is shown how a blocked (matrix-matrix multiply) based algorithm can be derived , by creating a WY transform, which, when applied, yield the same result as a number of successive applications of Householder transforms: \begin{displaymath}
I + W Y^T = H_1 H_2 \cdots H_k\end{displaymath}where W and Y are both $ n \times k $.Given the k vectors that define the k Householder transforms, these matrices can be computed one column at a time in a relatively straight forward manner.

Given the WY transform discussion above, the blocked version of the algorithm is given in Fig. 4.

4.2 PLAPACK Implementation

A new parallel algorithm

The expert optimization of the QR factorization lies in the ability of PLAPACK of viewing parts of the matrix as a panel that can be redistributed as a multivector on the nodes. A PLAPACK implementation that explores this is given in Fig. 4.

   Figure 4: Optimized implementation of QR factorization using PLAPACK

{ p{3in} \vert p{3in}}
 ...ticolumn{1}{c}{\normalsize (b) PLAPACK implementation}\end{tabular}}\end{figure}

next up previous
Next: 5 Performance Up: PLAPACK: High Performance through Previous: 3 LU Factorization (with