The WY representation for products of householder matrices
SIAM Journal on Scientific and Statistical Computing - Papers from the Second Conference on Parallel Processing for Scientific Computin
A storage-efficient WY representation for products of householder transformations
SIAM Journal on Scientific and Statistical Computing
ACM Transactions on Mathematical Software (TOMS)
On a block implementation of Hessenberg multishift QR iteration
International Journal of High Speed Computing
LAPACK's user's guide
A parallel algorithm for reducing symmetric banded matrices to tridiagonal form
SIAM Journal on Scientific Computing
Shifting strategies for the parallel QR algorithm
SIAM Journal on Scientific Computing
Theory of Decomposition and Bulge-Chasing Algorithms for the Generalized Eigenvalue Problem
SIAM Journal on Matrix Analysis and Applications
Matrix computations (3rd ed.)
ScaLAPACK user's guide
Using Level 3 BLAS in Rotation-Based Algorithms
SIAM Journal on Scientific Computing
GEMM-based level 3 BLAS: high-performance model implementations and performance evaluation benchmark
ACM Transactions on Mathematical Software (TOMS)
Algorithm 784: GEMM-based level 3 BLAS: portability and optimization issues
ACM Transactions on Mathematical Software (TOMS)
Bulge Exchanges in Algorithms of QR Type
SIAM Journal on Matrix Analysis and Applications
Reduction of a Regular Matrix Pair (A, B) to Block Hessenberg Triangular Form
PARA '95 Proceedings of the Second International Workshop on Applied Parallel Computing, Computations in Physics, Chemistry and Engineering Science
PARA '96 Proceedings of the Third International Workshop on Applied Parallel Computing, Industrial Computation and Optimization
A ScaLAPACK-Style Algorithm for Reducing a Regular Matrix Pair to Block Hessenberg-Triangular Form
PARA '98 Proceedings of the 4th International Workshop on Applied Parallel Computing, Large Scale Scientific and Industrial Problems
LAPACK Working Note 41: Installation Guide for LAPACK
LAPACK Working Note 41: Installation Guide for LAPACK
Parallel Implementation of the Nonsymmetric QR Algorithm forDistributed Memory Architectures
Parallel Implementation of the Nonsymmetric QR Algorithm forDistributed Memory Architectures
ACM Transactions on Mathematical Software (TOMS)
ACM Transactions on Mathematical Software (TOMS)
Parallel Two-Stage Reduction of a Regular Matrix Pair to Hessenberg-Triangular Form
PARA '00 Proceedings of the 5th International Workshop on Applied Parallel Computing, New Paradigms for HPC in Industry and Academia
Parallel Two-Sided Sylvester-Type Matrix Equation Solvers for SMP Systems Using Recursive Blocking
PARA '02 Proceedings of the 6th International Conference on Applied Parallel Computing Advanced Scientific Computing
PARA '02 Proceedings of the 6th International Conference on Applied Parallel Computing Advanced Scientific Computing
Block algorithms for reordering standard and generalized Schur forms
ACM Transactions on Mathematical Software (TOMS)
Parallel variants of the multishift QZ algorithm with advanced deflation techniques
PARA'06 Proceedings of the 8th international conference on Applied parallel computing: state of the art in scientific computing
ACM Transactions on Mathematical Software (TOMS)
A Novel Parallel QR Algorithm for Hybrid Distributed Memory HPC Systems
SIAM Journal on Scientific Computing
Efficient reduction from block hessenberg form to hessenberg form using shared memory
PARA'10 Proceedings of the 10th international conference on Applied Parallel and Scientific Computing - Volume 2
Hi-index | 0.02 |
A two-stage blocked algorithm for reduction of a regular matrix pair (A , B ) to upper Hessenberg-triangular form is presented. In stage 1 (A, B is reduced to block upper Hessenberg-triangular form using mainly level 3 (matrix-matrix) operations that permit data reuse in the higher levels of a memory hierarchy. In the second stage all but one of the r subdiagonals of the block Hessenberg A-part are set to zero using Givens rotations. The algorithm proceeds in a sequence of supersweeps, each reducing m columns. The updates with respect to row and column rotations are organized to reference consecutive columns of A and B. To further improve the data locality, all rotations produced in a supersweep are stored to enable a left-looking reference pattern, i.e., all updates are delayed until they are required for the continuation of the supersweep. Moreover, we present a blocked variant of the single-diagonal double-shift QZ method for computing the generalized Schur form of (A, B in upper Hessenberg-triangular form. The blocking for improved data locality is done similarly, now by restructuring the reference pattern of the updates associated with the bulge chasing in the QZ iteration. Timing results show that our new blocked variants outperform the current LAPACK routines, including drivers for the generalized eigenvalue problem, by a factor 2-5 for sufficiently large problems.