A Novel Parallel QR Algorithm for Hybrid Distributed Memory HPC Systems
SIAM Journal on Scientific Computing
On aggressive early deflation in parallel variants of the QR algorithm
PARA'10 Proceedings of the 10th international conference on Applied Parallel and Scientific Computing - Volume Part I
Hi-index | 0.00 |
A parallel algorithm for reordering the eigenvalues in the real Schur form of a matrix is presented and discussed. Our novel approach adopts computational windows and delays multiple outside-window updates until each window has been completely reordered locally. By using multiple concurrent windows the parallel algorithm has a high level of concurrency, and most work is level 3 BLAS operations. The presented algorithm is also extended to the generalized real Schur form. Experimental results for ScaLAPACK-style Fortran 77 implementations on a Linux cluster confirm the efficiency and scalability of our algorithms in terms of more than 16 times of parallel speedup using 64 processors for large-scale problems. Even on a single processor our implementation is demonstrated to perform significantly better compared with the state-of-the-art serial implementation. Copyright © 2009 John Wiley & Sons, Ltd. Technical Report UMINF-07.20, Department of Computing Science, Umeå University, Sweden.