Average-case stability of Gaussian elimination
SIAM Journal on Matrix Analysis and Applications
Locality of Reference in LU Decomposition with Partial Pivoting
SIAM Journal on Matrix Analysis and Applications
Recursion leads to automatic variable blocking for dense linear-algebra algorithms
IBM Journal of Research and Development
Highly Latency Tolerant Gaussian Elimination
GRID '05 Proceedings of the 6th IEEE/ACM International Workshop on Grid Computing
Design and Implementation of the ScaLAPACK LU, QR, and Cholesky Factorization Routines
Scientific Programming
Benchmarking GPUs to tune dense linear algebra
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Communication-optimal parallel and sequential Cholesky decomposition: extended abstract
Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures
Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
Managing the complexity of lookahead for LU factorization with pivoting
Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
Communication-optimal parallel 2.5D matrix multiplication and LU factorization algorithms
Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part II
Improving communication performance in dense linear algebra via topology aware collectives
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Hypergraph-Based Unsymmetric Nested Dissection Ordering for Sparse LU Factorization
SIAM Journal on Scientific Computing
Communication-optimal Parallel and Sequential Cholesky Decomposition
SIAM Journal on Scientific Computing
Enabling and scaling matrix computations on heterogeneous multi-core and multi-GPU systems
Proceedings of the 26th ACM international conference on Supercomputing
Communication-optimal Parallel and Sequential QR and LU Factorizations
SIAM Journal on Scientific Computing
CALU: A Communication Optimal LU Factorization Algorithm
SIAM Journal on Matrix Analysis and Applications
Avoiding communication through a multilevel LU factorization
Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
Accelerating Linear System Solutions Using Randomization Techniques
ACM Transactions on Mathematical Software (TOMS)
Hi-index | 0.00 |
We present CALU, a Communication Avoiding algorithm for the LU factorization of dense matrices distributed in a two-dimensional cyclic layout. The algorithm is based on a new pivoting strategy, which is stable in practice. The new algorithm is optimal (up to polylogarithmic factors) in the amount of communication it performs. Our experiments show that CALU leads to a reduction in the parallel time, in particular when the latency time is an important factor of the overall time. The factorization of a block-column, a subroutime of CALU, outperforms the corresponding routine PDGETF2 from ScaLAPACK up to a factor of 4.37 on an IBM POWER 5 system and up to a factor of 5.58 on a Cray XT4 system. On square matrices of order 104, CALU outperforms the corresponding routine PDGETRF from ScaLAPACK by a factor of 1.24 on IBM POWER 5 and by a factor of 1.31 on Cray XT4.