Optimally packed chains of bulges in multishift QR algorithms

Authors:
Lars Karlsson;Daniel Kressner;Bruno Lang
Affiliations:
Umeå University, Sweden;EPF Lausanne, Switzerland;University of Wuppertal, Germany
Venue:
ACM Transactions on Mathematical Software (TOMS)
Year:
2014

Citing 12
Cited 0

On a block implementation of Hessenberg multishift QR iteration

International Journal of High Speed Computing
Shifting strategies for the parallel QR algorithm

SIAM Journal on Scientific Computing
Forward Stability and Transmission of Shifts in the $QR$ Algorithm

SIAM Journal on Matrix Analysis and Applications
Matrix computations (3rd ed.)

Matrix computations (3rd ed.)
LAPACK Users' guide (third ed.)

LAPACK Users' guide (third ed.)
A Parallel Implementation of the Nonsymmetric QR Algorithm for Distributed Memory Architectures

SIAM Journal on Scientific Computing
The Multishift QR Algorithm. Part I: Maintaining Well-Focused Shifts and Level 3 Performance

SIAM Journal on Matrix Analysis and Applications
The Multishift QR Algorithm. Part II: Aggressive Early Deflation

SIAM Journal on Matrix Analysis and Applications
A Test Matrix Collection for Non-Hermitian Eigenvalue Problems

A Test Matrix Collection for Non-Hermitian Eigenvalue Problems
Multishift Variants of the QZ Algorithm with Aggressive Early Deflation

SIAM Journal on Matrix Analysis and Applications
The Matrix Eigenvalue Problem: GR and Krylov Subspace Methods

The Matrix Eigenvalue Problem: GR and Krylov Subspace Methods
A Novel Parallel QR Algorithm for Hybrid Distributed Memory HPC Systems

SIAM Journal on Scientific Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

The QR algorithm is the method of choice for computing all eigenvalues of a dense nonsymmetric matrix A. After an initial reduction to Hessenberg form, a QR iteration can be viewed as chasing a small bulge from the top left to the bottom right corner along the subdiagonal of A. To increase data locality and create potential for parallelism, modern variants of the QR algorithm perform several iterations simultaneously, which amounts to chasing a chain of several bulges instead of a single bulge. To make effective use of level 3 BLAS, it is important to pack these bulges as tightly as possible within the chain. In this work, we show that the tightness of the packing in existing approaches is not optimal and can be increased. This directly translates into a reduced chain length by 33% compared to the state-of-the-art LAPACK implementation of the QR algorithm. To demonstrate the impact of our idea, we have modified the LAPACK implementation to make use of the optimal packing. Numerical experiments reveal a uniform reduction of the execution time, without affecting stability or robustness.