Efficient parallel solution of linear systems
STOC '85 Proceedings of the seventeenth annual ACM symposium on Theory of computing
A Strassen-Newton algorithm for high-speed parallelizable matrix inversion
Proceedings of the 1988 ACM/IEEE conference on Supercomputing
O(log2 n) time efficient parallel factorization of dense, sparse separable, and banded matrices
SPAA '94 Proceedings of the sixth annual ACM symposium on Parallel algorithms and architectures
A note on a method for generating points uniformly on n-dimensional spheres
Communications of the ACM
Accuracy and Stability of Numerical Algorithms
Accuracy and Stability of Numerical Algorithms
Introduction to Algorithms
A cellular computer to implement the kalman filter algorithm
A cellular computer to implement the kalman filter algorithm
Numerische Mathematik
Dynamic task scheduling for linear algebra algorithms on distributed-memory multicore systems
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Scheduling dense linear algebra operations on multicore processors
Concurrency and Computation: Practice & Experience
Communication-optimal parallel 2.5D matrix multiplication and LU factorization algorithms
Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part II
Multiplying matrices faster than coppersmith-winograd
STOC '12 Proceedings of the forty-fourth annual ACM symposium on Theory of computing
Communication-optimal parallel algorithm for strassen's matrix multiplication
Proceedings of the twenty-fourth annual ACM symposium on Parallelism in algorithms and architectures
Hi-index | 0.00 |
We present an algorithm for matrix inversion that combines the practical requirement of an optimal number of arithmetic operations and the theoretical goal of a polylogarithmic critical path length. The algorithm reduces inversion to matrix multiplication. It uses Strassen's recursion scheme but on the critical path, it breaks the recursion early switching to an asymptotically inefficient yet fast use of Newton's method. We also show that the algorithm is numerically stable. Overall, we get a candidate for a massively parallel algorithm that scales to exascale systems even on relatively small inputs. Preliminary experiments on multicore machines give the surprising result that even on such moderately parallel machines the algorithm outperforms Intel's Math Kernel Library and that Strassen's algorithm seems to be numerically more stable than one might expect.