A three-dimensional approach to parallel matrix multiplication
IBM Journal of Research and Development
Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology
ICS '97 Proceedings of the 11th international conference on Supercomputing
ScaLAPACK user's guide
A scalable parallel Strassen's matrix multiplication algorithm for distributed-memory computers
SAC '95 Proceedings of the 1995 ACM symposium on Applied computing
Implementation of Strassen's algorithm for matrix multiplication
Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
A Flexible Class of Parallel Matrix Multiplication Algorithms
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Automatically Tuned Linear Algebra Software
Automatically Tuned Linear Algebra Software
A High Performance Parallel Strassen Implementation
A High Performance Parallel Strassen Implementation
Tlib-a library to support programming with hierarchical multi-processor tasks
Journal of Parallel and Distributed Computing
Concurrency and Computation: Practice & Experience
HPCASIA '05 Proceedings of the Eighth International Conference on High-Performance Computing in Asia-Pacific Region
Anatomy of high-performance matrix multiplication
ACM Transactions on Mathematical Software (TOMS)
Automatic tuning of PDGEMM towards optimal performance
Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
Communication-optimal parallel algorithm for strassen's matrix multiplication
Proceedings of the twenty-fourth annual ACM symposium on Parallelism in algorithms and architectures
Generalizing matrix multiplication for efficient computations on modern computers
PPAM'11 Proceedings of the 9th international conference on Parallel Processing and Applied Mathematics - Volume Part I
The Journal of Supercomputing
Parallel partitioning for distributed systems using sequential assignment
Journal of Parallel and Distributed Computing
Hi-index | 0.00 |
This paper presents parallel algorithms for matrix-matrix multiplication which are built up from several algorithms in a multi-level structure. The upper level consists of Strassen's algorithm which is performed for a predefined number of recursions. The number of recursions can be adapted to the specific execution platform. The intermediate level is performed by a parallel non-hierarchical algorithm and the lower level uses efficient one-processor implementations of matrix-matrix multiplication like BLAS or ATLAS. Both the number of recursions of Strassen's algorithm and the specific algorithms of the intermediate and lower level can be chosen so that a variety of different multi-level algorithms results. Each level of the multi-level algorithms is associated with a hierarchical partition of the set of available processors into disjoint subsets so that deeper levels of the algorithm employ smaller groups of processors in parallel. The algorithms are expressed in the multiprocessor task programming model and are coded with the runtime library Tlib. Performance experiments on several parallel platforms show that the multi-level algorithms can lead to significant performance gains.