High Performance Cholesky Factorization via Blocking and Recursion That Uses Minimal Storage

  • Authors:
  • Fred G. Gustavson;Isak Jonsson

  • Affiliations:
  • -;-

  • Venue:
  • PARA '00 Proceedings of the 5th International Workshop on Applied Parallel Computing, New Paradigms for HPC in Industry and Academia
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present a high performance Cholesky factorization algorithm, called BPC for Blocked Packed Cholesky, which performs better or equivalent to the LAPACK DPOTRF subroutine, but with about the same memory requirements as the LAPACK DPPTRF subroutine, which runs at level 2 BLAS speed. Algorithm BPC only calls DGEMM and level 3 kernel routines. It combines a recursive algorithm with blocking and a recursive packed data format. A full analysis of overcoming the non-linear addressing overhead imposed by recursion is given and discussed. Finally, since BPC uses GEMM to a great extent, we easily get a considerable amount of SMP parallelism from an SMP GEMM.