Blocked algorithms and software for reduction of a regular matrix pair to generalized Schur form
ACM Transactions on Mathematical Software (TOMS)
A framework for symmetric band reduction
ACM Transactions on Mathematical Software (TOMS)
PARA '02 Proceedings of the 6th International Conference on Applied Parallel Computing Advanced Scientific Computing
A Novel Parallel QR Algorithm for Hybrid Distributed Memory HPC Systems
SIAM Journal on Scientific Computing
Divide and Conquer on Hybrid GPU-Accelerated Multicore Systems
SIAM Journal on Scientific Computing
Hi-index | 0.00 |
This paper presents a technique that allows using level 3 BLAS in a number of rotation-based algorithms. In particular, the update of an orthogonal transformation matrix which often involves the vast majority of operations can be done with a matrix--matrix product. As a case study, the technique is applied to the QR and QL algorithms for computing the eigensystem of a symmetric tridiagonal matrix. The modifications do not affect the convergence properties of the algorithms nor do they significantly increase the overall number of operations. Thus, the computations can be sped up by more than 50% on machines with a distinct memory hierarchy, like the Intel i860 or IBM RS/6000, provided the block size is set appropriately. We also present a simple theoretical analysis that allows selecting an almost-optimal block size.