Average-case stability of Gaussian elimination
SIAM Journal on Matrix Analysis and Applications
Applied numerical linear algebra
Applied numerical linear algebra
Locality of Reference in LU Decomposition with Partial Pivoting
SIAM Journal on Matrix Analysis and Applications
Recursion leads to automatic variable blocking for dense linear-algebra algorithms
IBM Journal of Research and Development
LAPACK Users' guide (third ed.)
LAPACK Users' guide (third ed.)
Accuracy and Stability of Numerical Algorithms
Accuracy and Stability of Numerical Algorithms
I/O complexity: The red-blue pebble game
STOC '81 Proceedings of the thirteenth annual ACM symposium on Theory of computing
Communication lower bounds for distributed-memory matrix multiplication
Journal of Parallel and Distributed Computing
Highly Latency Tolerant Gaussian Elimination
GRID '05 Proceedings of the 6th IEEE/ACM International Workshop on Grid Computing
Analysis of Pairwise Pivoting in Gaussian Elimination
IEEE Transactions on Computers
Updating an LU Factorization with Pivoting
ACM Transactions on Mathematical Software (TOMS)
Design and Implementation of the ScaLAPACK LU, QR, and Cholesky Factorization Routines
Scientific Programming
Communication avoiding Gaussian elimination
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Towards dense linear algebra for hybrid GPU accelerated manycore systems
Parallel Computing
Rapid development of high-performance out-of-core solvers
PARA'04 Proceedings of the 7th international conference on Applied Parallel Computing: state of the Art in Scientific Computing
Graph expansion and communication costs of fast matrix multiplication
Journal of the ACM (JACM)
Avoiding communication through a multilevel LU factorization
Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
Proceedings of the twenty-fifth annual ACM symposium on Parallelism in algorithms and architectures
Hi-index | 0.00 |
Since the cost of communication (moving data) greatly exceeds the cost of doing arithmetic on current and future computing platforms, we are motivated to devise algorithms that communicate as little as possible, even if they do slightly more arithmetic, and as long as they still get the right answer. This paper is about getting the right answer for such an algorithm. It discusses CALU, a communication avoiding LU factorization algorithm based on a new pivoting strategy, that we refer to as tournament pivoting. The reason to consider CALU is that it does an optimal amount of communication, and asymptotically less than Gaussian elimination with partial pivoting (GEPP), and so will be much faster on platforms where communication is expensive, as shown in previous work. We show that the Schur complement obtained after each step of performing CALU on a matrix $A$ is the same as the Schur complement obtained after performing GEPP on a larger matrix whose entries are the same as the entries of $A$ (sometimes slightly perturbed) and zeros. More generally, the entire CALU process is equivalent to GEPP on a large, but very sparse matrix, formed by entries of $A$ and zeros. Hence we expect that CALU will behave as GEPP and it will also be very stable in practice. In addition, extensive experiments on random matrices and a set of special matrices show that CALU is stable in practice. The upper bound on the growth factor of CALU is worse than that of GEPP. However, there are Wilkinson-like matrices for which GEPP has exponential growth factor, but not CALU, and vice-versa.