Three algorithms for Cholesky factorization on distributed memory using packed storage

  • Authors:
  • Fred G. Gustavson;Lars Karlsson;Bo Kågström

  • Affiliations:
  • IBM T. J. Watson Research Center, Yorktown Heights, NY and Department of Computing Science and HPC2N, Umeå University, Umeå, Sweden;Department of Computing Science and HPC2N, Umeå University, Umeå, Sweden;Department of Computing Science and HPC2N, Umeå University, Umeå, Sweden

  • Venue:
  • PARA'06 Proceedings of the 8th international conference on Applied parallel computing: state of the art in scientific computing
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present three algorithms for Cholesky factorization using minimum block storage for a distributed memory (DM) environment. One of the distributed square block packed (SBP) format algorithms performs similar to ScaLAPACK PDPOTRF, and our algorithm with iteration overlapping typically outperforms it by 15-50% for small and medium sized matrices. By storing the blocks contiguously, we get better performing BLAS operations. Our DM algorithms are not sensitive to cache conflicts and thus give smooth and predictable performance. We also investigate the intricacies of using rectangular full packed (RFP) format with ScaLAPACK routines and point out some advantages and drawbacks.