Efficient householder QR factorization for superscalar processors

Authors:
James J. Carrig, Jr.;Gerard G. L. Meyer
Affiliations:
Johns Hopkins Univ., Baltimore, MD;Johns Hopkins Univ., Baltimore, MD
Venue:
ACM Transactions on Mathematical Software (TOMS)
Year:
1997

Citing 8
Cited 0

The WY representation for products of householder matrices

SIAM Journal on Scientific and Statistical Computing - Papers from the Second Conference on Parallel Processing for Scientific Computin
An extended set of FORTRAN basic linear algebra subprograms

ACM Transactions on Mathematical Software (TOMS)
A storage-efficient WY representation for products of householder transformations

SIAM Journal on Scientific and Statistical Computing
POWER2: next generation of the RISC System/6000 family

IBM Journal of Research and Development
Matrix computations (3rd ed.)

Matrix computations (3rd ed.)
Unitary Triangularization of a Nonsymmetric Matrix

Journal of the ACM (JACM)
LAPACK Users' guide (third ed.)

LAPACK Users' guide (third ed.)
A Loop Transformation Theory and an Algorithm to Maximize Parallelism

IEEE Transactions on Parallel and Distributed Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

To extract the potential promised by superscalar processors, algorithm designers must streamline memory references and allow for efficient data reuse throughout the memory hierarchy. Two parameterized Householder QR factorization algorithms are presented that take into account the caches and registers typical of such processors. Guidelines are developed for choosing parameter values that obtain near-optimal cache and register utilization. The new algorithms are implemented and performance-tuned on an Intel Pentium Pro system, a single thin POWER2 node of the IBM Scalable Parallel system 2 (SP2), and a single R8000 processor of a Silicon Graphs POWER Challenge XL.