Three algorithms for Cholesky factorization on distributed memory using packed storage

Authors:
Fred G. Gustavson;Lars Karlsson;Bo Kågström
Affiliations:
IBM T. J. Watson Research Center, Yorktown Heights, NY and Department of Computing Science and HPC2N, Umeå University, Umeå, Sweden;Department of Computing Science and HPC2N, Umeå University, Umeå, Sweden;Department of Computing Science and HPC2N, Umeå University, Umeå, Sweden
Venue:
PARA'06 Proceedings of the 8th international conference on Applied parallel computing: state of the art in scientific computing
Year:
2006

Citing 6
Cited 7

ScaLAPACK user's guide

ScaLAPACK user's guide
Packed Storage Extension for ScaLAPACK

Packed Storage Extension for ScaLAPACK
Design and Implementation of the ScaLAPACK LU, QR, and Cholesky Factorization Routines

Scientific Programming
Implementing linear algebra routines on multi-core processors with pipelining and a look ahead

PARA'06 Proceedings of the 8th international conference on Applied parallel computing: state of the art in scientific computing
Rectangular full packed format for LAPACK algorithms timings on several computers

PARA'06 Proceedings of the 8th international conference on Applied parallel computing: state of the art in scientific computing
New generalized data structures for matrices lead to a variety of high performance dense linear algebra algorithms

PARA'04 Proceedings of the 7th international conference on Applied Parallel Computing: state of the Art in Scientific Computing

SuperMatrix: a multithreaded runtime scheduling system for algorithms-by-blocks

Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
A class of parallel tiled linear algebra algorithms for multicore architectures

Parallel Computing
Distributed SBP Cholesky factorization algorithms with near-optimal scheduling

ACM Transactions on Mathematical Software (TOMS)
Programming matrix algorithms-by-blocks for thread-level parallelism

ACM Transactions on Mathematical Software (TOMS)
Implementing linear algebra routines on multi-core processors with pipelining and a look ahead

PARA'06 Proceedings of the 8th international conference on Applied parallel computing: state of the art in scientific computing
New data distribution for solving triangular systems on distributed memory machines

PARA'06 Proceedings of the 8th international conference on Applied parallel computing: state of the art in scientific computing
Parallel tiled QR factorization for multicore architectures

PPAM'07 Proceedings of the 7th international conference on Parallel processing and applied mathematics

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present three algorithms for Cholesky factorization using minimum block storage for a distributed memory (DM) environment. One of the distributed square block packed (SBP) format algorithms performs similar to ScaLAPACK PDPOTRF, and our algorithm with iteration overlapping typically outperforms it by 15-50% for small and medium sized matrices. By storing the blocks contiguously, we get better performing BLAS operations. Our DM algorithms are not sensitive to cache conflicts and thus give smooth and predictable performance. We also investigate the intricacies of using rectangular full packed (RFP) format with ScaLAPACK routines and point out some advantages and drawbacks.