Unit 11.4.3 Optimizing the bidiagonal QR algorithm
As the Givens' rotations are applied to the bidiagonal matrix, they are also applied to matrices in which the left and right singular vectors are accumulated (matrices \(U \) and \(V \)). If we start with an \(m \times m \) matrix, one step of introducing the bulge and chasing it out the matrix requires \(O( m ) \) computation. Accumulating the Givens' rotations into \(U \) and \(V \) requires \(O( m^2 ) \) computation for each such step, with \(O( m^2 ) \) data. As was discussed in Unit 10.4.4 for the implicitly shifted QR algorithm, this inherently means the cost of accessing data dominates on current architectures.
The paper also mentioned in Unit 10.4.4, also describes techniques for applying the Givens' rotations for several steps of the Implicitly shifted bidiagonal QR algorithm at the same time, which allows one to attain high performance similar to that attained by a matrix-matrix multiplication.
 Field G. Van Zee, Robert A. van de Geijn, Gregorio Quintana-Ortí, Restructuring the Tridiagonal and Bidiagonal QR Algorithms for Performance, ACM Transactions on Mathematical Software (TOMS), Vol. 40, No. 3, 2014.
To our knowledge, this yields the fastest implementation for finding the SVD of a bidiagonal matrix.