Cache efficient bidiagonalization using BLAS 2.5 operators

  • Authors:
  • Gary W. Howell;James W. Demmel;Charles T. Fulton;Sven Hammarling;Karen Marmol

  • Affiliations:
  • North Carolina State University, Raleigh, NC;University of California, Berkeley, CA;Florida Institute of Technology, Melbourne, FL;University of Manchester, UK;Harris Corporation, Melbourne, FL

  • Venue:
  • ACM Transactions on Mathematical Software (TOMS)
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

On cache based computer architectures using current standard algorithms, Householder bidiagonalization requires a significant portion of the execution time for computing matrix singular values and vectors. In this paper we reorganize the sequence of operations for Householder bidiagonalization of a general m × n matrix, so that two (_GEMV) vector-matrix multiplications can be done with one pass of the unreduced trailing part of the matrix through cache. Two new BLAS operations approximately cut in half the transfer of data from main memory to cache, reducing execution times by up to 25 per cent. We give detailed algorithm descriptions and compare timings with the current LAPACK bidiagonalization algorithm.