Communication-optimal parallel and sequential Cholesky decomposition: extended abstract
Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures
Communication-optimal Parallel and Sequential Cholesky Decomposition
SIAM Journal on Scientific Computing
ACM Transactions on Mathematical Software (TOMS)
Hi-index | 0.01 |
We compare, in the same framework, out-of-core implementations of the Cholesky factorization algorithm. The candidate implementations are the classical blocked left-looking variant and a more recent recursive formulation. Both have been implemented for real positive definite matrices: the former in the parallel out-of-core linear algebra package (POOCLAPACK) library and the latter in the scalable out-of-core linear algebra computations (SOLAR) library. We perform a theoretical analysis of the amount of input/output (I/O) operations required by each variant. We consider alternatives for the left-looking algorithm: the one-tile and two-tiles approaches. We show that when main memory is restricted, the one-tile approach yields less I/O volume. We then show that the left-looking implementation requires less I/O volume than the recursive variant. We have implemented all for complex matrices, and we report on numerical experiments.