Extra high speed matrix multiplication on the Cray-2
SIAM Journal on Scientific and Statistical Computing
Communication complexity of PRAMs
Theoretical Computer Science - Special issue: Fifteenth international colloquium on automata, languages and programming, Tampere, Finland, July 1988
GEMMW: a portable level 3 BLAS Winograd variant of Strassen's matrix-matrix multiply algorithm
Journal of Computational Physics
Using MPI: portable parallel programming with the message-passing interface
Using MPI: portable parallel programming with the message-passing interface
IBM Journal of Research and Development
Matrix computations (3rd ed.)
SUMMA: Scalable Universal Matrix Multiplication Algorithm
SUMMA: Scalable Universal Matrix Multiplication Algorithm
A High Performance Parallel Strassen Implementation
A High Performance Parallel Strassen Implementation
Problem space promotion and its evaluation as a technique for efficient parallel computation
ICS '99 Proceedings of the 13th international conference on Supercomputing
Parallel Complexity of Matrix Multiplication
The Journal of Supercomputing
A Flexible Class of Parallel Matrix Multiplication Algorithms
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Communication lower bounds for distributed-memory matrix multiplication
Journal of Parallel and Distributed Computing
Combining building blocks for parallel multi-level matrix multiplication
Parallel Computing
ACM SIGMETRICS Performance Evaluation Review - Special issue on the 1st international workshop on performance modeling, benchmarking and simulation of high performance computing systems (PMBS 10)
Communication-optimal parallel 2.5D matrix multiplication and LU factorization algorithms
Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part II
Improving communication performance in dense linear algebra via topology aware collectives
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Communication-optimal parallel algorithm for strassen's matrix multiplication
Proceedings of the twenty-fourth annual ACM symposium on Parallelism in algorithms and architectures
Optimizing linpack benchmark on GPU-accelerated petascale supercomputer
Journal of Computer Science and Technology - Special issue on Community Analysis and Information Recommendation
Communication avoiding and overlapping for numerical linear algebra
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Communication-avoiding parallel strassen: implementation and performance
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Graph expansion and communication costs of fast matrix multiplication
Journal of the ACM (JACM)
Communication optimal parallel multiplication of sparse random matrices
Proceedings of the twenty-fifth annual ACM symposium on Parallelism in algorithms and architectures
Communication costs of Strassen's matrix multiplication
Communications of the ACM
Hi-index | 0.02 |