Efficient householder QR factorization for superscalar processors

  • Authors:
  • James J. Carrig, Jr.;Gerard G. L. Meyer

  • Affiliations:
  • Johns Hopkins Univ., Baltimore, MD;Johns Hopkins Univ., Baltimore, MD

  • Venue:
  • ACM Transactions on Mathematical Software (TOMS)
  • Year:
  • 1997

Quantified Score

Hi-index 0.00

Visualization

Abstract

To extract the potential promised by superscalar processors, algorithm designers must streamline memory references and allow for efficient data reuse throughout the memory hierarchy. Two parameterized Householder QR factorization algorithms are presented that take into account the caches and registers typical of such processors. Guidelines are developed for choosing parameter values that obtain near-optimal cache and register utilization. The new algorithms are implemented and performance-tuned on an Intel Pentium Pro system, a single thin POWER2 node of the IBM Scalable Parallel system 2 (SP2), and a single R8000 processor of a Silicon Graphs POWER Challenge XL.