How can we speed up matrix multiplication?
SIAM Review
Computational complexity of sequential and parallel algorithms
Computational complexity of sequential and parallel algorithms
Efficient parallel solution of linear systems
STOC '85 Proceedings of the seventeenth annual ACM symposium on Theory of computing
Matrix multiplication via arithmetic progressions
STOC '87 Proceedings of the nineteenth annual ACM symposium on Theory of computing
Extra high speed matrix multiplication on the Cray-2
SIAM Journal on Scientific and Statistical Computing
Some Complexity Results for Matrix Computations on Parallel Processors
Journal of the ACM (JACM)
HPCASIA '05 Proceedings of the Eighth International Conference on High-Performance Computing in Asia-Pacific Region
Adaptive Strassen's matrix multiplication
Proceedings of the 21st annual international conference on Supercomputing
A Fine-Grained Pipelined Implementation for Large-Scale Matrix Inversion on FPGA
APPT '09 Proceedings of the 8th International Symposium on Advanced Parallel Processing Technologies
Using recursion to boost ATLAS's performance
ISHPC'05/ALPS'06 Proceedings of the 6th international symposium on high-performance computing and 1st international conference on Advanced low power systems
Work-efficient matrix inversion in polylogarithmic time
Proceedings of the twenty-fifth annual ACM symposium on Parallelism in algorithms and architectures
Technical Note: A fast parallel Gauss Jordan algorithm for matrix inversion using CUDA
Computers and Structures
Hi-index | 0.00 |
This paper describes techniques to compute matrix inverses by means of algorithms that are highly suited to massively parallel computation. In contrast, conventional techniques such as pivoted Gaussian elimination and LU decomposition are efficient only on vector computers or fairly low-level parallel systems.These techniques are based on an algorithm suggested by Strassen in 1969. Variations of this scheme employ matrix Newton iterations and other methods to improve the numerical stability while at the same time preserving a very high level of parallelism. One-processor Cray-2 implementations of these schemes range from one that is up to 55% faster than a conventional library routine to one that, while slower than a library routine, achieves excellent numerical stability.The problem of computing the solution to a single set of linear equations is discussed, and it is shown that shown that this problem can also be solved efficiently using these techniques.