Communication avoiding Gaussian elimination

Authors:
Laura Grigori;James W. Demmel;Hua Xiang
Affiliations:
Universite Paris-Sud, Orsay France;UC Berkeley, CA;Universite Paris-Sud, Orsay France
Venue:
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Year:
2008

Citing 6
Cited 12

Average-case stability of Gaussian elimination

SIAM Journal on Matrix Analysis and Applications
Locality of Reference in LU Decomposition with Partial Pivoting

SIAM Journal on Matrix Analysis and Applications
Recursion leads to automatic variable blocking for dense linear-algebra algorithms

IBM Journal of Research and Development
Highly Latency Tolerant Gaussian Elimination

GRID '05 Proceedings of the 6th IEEE/ACM International Workshop on Grid Computing
Design and Implementation of the ScaLAPACK LU, QR, and Cholesky Factorization Routines

Scientific Programming
Benchmarking GPUs to tune dense linear algebra

Proceedings of the 2008 ACM/IEEE conference on Supercomputing

Communication-optimal parallel and sequential Cholesky decomposition: extended abstract

Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures
Brief announcement: Lower bounds on communication for sparse Cholesky factorization of a model problem

Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
Managing the complexity of lookahead for LU factorization with pivoting

Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
Communication-optimal parallel 2.5D matrix multiplication and LU factorization algorithms

Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part II
Improving communication performance in dense linear algebra via topology aware collectives

Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Hypergraph-Based Unsymmetric Nested Dissection Ordering for Sparse LU Factorization

SIAM Journal on Scientific Computing
Communication-optimal Parallel and Sequential Cholesky Decomposition

SIAM Journal on Scientific Computing
Enabling and scaling matrix computations on heterogeneous multi-core and multi-GPU systems

Proceedings of the 26th ACM international conference on Supercomputing
Communication-optimal Parallel and Sequential QR and LU Factorizations

SIAM Journal on Scientific Computing
CALU: A Communication Optimal LU Factorization Algorithm

SIAM Journal on Matrix Analysis and Applications
Avoiding communication through a multilevel LU factorization

Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
Accelerating Linear System Solutions Using Randomization Techniques

ACM Transactions on Mathematical Software (TOMS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present CALU, a Communication Avoiding algorithm for the LU factorization of dense matrices distributed in a two-dimensional cyclic layout. The algorithm is based on a new pivoting strategy, which is stable in practice. The new algorithm is optimal (up to polylogarithmic factors) in the amount of communication it performs. Our experiments show that CALU leads to a reduction in the parallel time, in particular when the latency time is an important factor of the overall time. The factorization of a block-column, a subroutime of CALU, outperforms the corresponding routine PDGETF2 from ScaLAPACK up to a factor of 4.37 on an IBM POWER 5 system and up to a factor of 5.58 on a Cray XT4 system. On square matrices of order 104, CALU outperforms the corresponding routine PDGETRF from ScaLAPACK by a factor of 1.24 on IBM POWER 5 and by a factor of 1.31 on Cray XT4.