Using Level 3 BLAS in Rotation-Based Algorithms

Authors:
Bruno Lang
Affiliations:
-
Venue:
SIAM Journal on Scientific Computing
Year:
1998

Citing 0
Cited 5

Blocked algorithms and software for reduction of a regular matrix pair to generalized Schur form

ACM Transactions on Mathematical Software (TOMS)
A framework for symmetric band reduction

ACM Transactions on Mathematical Software (TOMS)
Parallel and Blocked Algorithms for Reduction of a Regular Matrix Pair to Hessenberg-Triangular and Generalized Schur Forms

PARA '02 Proceedings of the 6th International Conference on Applied Parallel Computing Advanced Scientific Computing
A Novel Parallel QR Algorithm for Hybrid Distributed Memory HPC Systems

SIAM Journal on Scientific Computing
Divide and Conquer on Hybrid GPU-Accelerated Multicore Systems

SIAM Journal on Scientific Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a technique that allows using level 3 BLAS in a number of rotation-based algorithms. In particular, the update of an orthogonal transformation matrix which often involves the vast majority of operations can be done with a matrix--matrix product. As a case study, the technique is applied to the QR and QL algorithms for computing the eigensystem of a symmetric tridiagonal matrix. The modifications do not affect the convergence properties of the algorithms nor do they significantly increase the overall number of operations. Thus, the computations can be sped up by more than 50% on machines with a distinct memory hierarchy, like the Intel i860 or IBM RS/6000, provided the block size is set appropriately. We also present a simple theoretical analysis that allows selecting an almost-optimal block size.