Minimal-storage high-performance Cholesky factorization via blocking and recursion

  • Authors:
  • F. G. Gustavson;I. Jonsson

  • Affiliations:
  • IBM Research Division, Thomas J. Watson Research Center, Yorktown Heights, New York;Department of Computing Science, Umeå University, Umeå, Sweden

  • Venue:
  • IBM Journal of Research and Development
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present a novel practical algorithm for Cholesky factorization when the matrix is stored in packed format by combining blocking and recursion. The algorithm simultaneously obtains Level 3 performance, conserves about half the storage, and avoids the production of Level 3 BLAS for packed format. We use recursive packed format, which was first described by Andersen et al. [1]. Our algorithm uses only DGEMM and Level 3 kernel routines; it first transforms standard packed format to packed recursive lower row format. Our new algorithm outperforms the Level 3 LAPACK routine DPOTRF even when we include the cost of data transformation. (This is true for three IBM platforms--the POWER3, the POWER2, and the PowerPC 604e.) For large matrices, blocking is not required for acceptable Level 3 performance. However, for small matrices the overhead of pure recursion and/or data transformation is too high. We analyze these costs analytically and provide detailed cost estimates. We show that blocking combined with recursion reduces all overheads to a tiny, acceptable level. However, a new problem of nonlinear addressing arises. We use two-dimensional mappings (tables) or data copying to overcome the high costs of directly computing addresses that are nonlinear functions of i and j.