High-Performance Library Software for QR Factorization

  • Authors:
  • Erik Elmroth;Fred G. Gustavson

  • Affiliations:
  • -;-

  • Venue:
  • PARA '00 Proceedings of the 5th International Workshop on Applied Parallel Computing, New Paradigms for HPC in Industry and Academia
  • Year:
  • 2000

Quantified Score

Hi-index 0.01

Visualization

Abstract

In [5,6], we presented algorithm RGEQR3, a purely recursive formulation of the QR factorization. Using recursion leads us to a natural way to choose the k-way aggregating Householder transform of Schreiber and Van Loan [10]. RGEQR3 is a performance critical subroutine for the main (hybrid recursive) routine RGEQRF for QR factorization of a general m × n matrix. This contribution presents a new version of RGEQRF and its accompanying SMP parallel counterpart, implemented for a future release of the IBM ESSL library. It represents a robust high-performance piece of library software for QR factorization on uniprocessor and multiprocessor systems. The implementation builds on previous results [5,6]. In particular, the new version is optimized in a number of ways to improve the performance; e.g., for small matrices and matrices with a very small number of columns. This is partly done by including mini blocking in the otherwise pure recursive RGEQR3. We describe the salient features of this implementation. Our serial implementation outperforms the corresponding LAPACK routine by 10-65% for square matrices and 10-100% on tall and thin matrices on the IBM POWER2 and POWER3 nodes. The tests covered matrix sizes which varied from very small to very large. The SMP parallel implementation shows close to perfect speedup on a 4-processor PPC604e node.