High-Performance Library Software for QR Factorization

Authors:
Erik Elmroth;Fred G. Gustavson
Affiliations:
-;-
Venue:
PARA '00 Proceedings of the 5th International Workshop on Applied Parallel Computing, New Paradigms for HPC in Industry and Academia
Year:
2000

Citing 8
Cited 2

The WY representation for products of householder matrices

SIAM Journal on Scientific and Statistical Computing - Papers from the Second Conference on Parallel Processing for Scientific Computin
A storage-efficient WY representation for products of householder transformations

SIAM Journal on Scientific and Statistical Computing
Exploiting functional parallelism of POWER2 to design high-performance numerical algorithms

IBM Journal of Research and Development
Locality of Reference in LU Decomposition with Partial Pivoting

SIAM Journal on Matrix Analysis and Applications
Recursion leads to automatic variable blocking for dense linear-algebra algorithms

IBM Journal of Research and Development
New Serial and Parallel Recursive QR Factorization Algorithms for SMP Systems

PARA '98 Proceedings of the 4th International Workshop on Applied Parallel Computing, Large Scale Scientific and Industrial Problems
Superscalar GEMM-based Level 3 BLAS - The On-going Evolution of a Portable and High-Performance Library

PARA '98 Proceedings of the 4th International Workshop on Applied Parallel Computing, Large Scale Scientific and Industrial Problems
Applying recursion to serial and parallel QR factorization leads to better performance

IBM Journal of Research and Development

Management of deep memory hierarchies: recursive blocked algorithms and hybrid data structures for dense matrix computations

PARA'04 Proceedings of the 7th international conference on Applied Parallel Computing: state of the Art in Scientific Computing
Performance study of matrix computations using multi-core programming tools

Proceedings of the Fifth Balkan Conference in Informatics

Quantified Score

Hi-index	0.01

Visualization

Abstract

In [5,6], we presented algorithm RGEQR3, a purely recursive formulation of the QR factorization. Using recursion leads us to a natural way to choose the k-way aggregating Householder transform of Schreiber and Van Loan [10]. RGEQR3 is a performance critical subroutine for the main (hybrid recursive) routine RGEQRF for QR factorization of a general m × n matrix. This contribution presents a new version of RGEQRF and its accompanying SMP parallel counterpart, implemented for a future release of the IBM ESSL library. It represents a robust high-performance piece of library software for QR factorization on uniprocessor and multiprocessor systems. The implementation builds on previous results [5,6]. In particular, the new version is optimized in a number of ways to improve the performance; e.g., for small matrices and matrices with a very small number of columns. This is partly done by including mini blocking in the otherwise pure recursive RGEQR3. We describe the salient features of this implementation. Our serial implementation outperforms the corresponding LAPACK routine by 10-65% for square matrices and 10-100% on tall and thin matrices on the IBM POWER2 and POWER3 nodes. The tests covered matrix sizes which varied from very small to very large. The SMP parallel implementation shows close to perfect speedup on a 4-processor PPC604e node.