Solving problems on concurrent processors. Vol. 1: General techniques and regular problems
Solving problems on concurrent processors. Vol. 1: General techniques and regular problems
Hypercube algorithms: with applications to image processing and pattern recognition
Hypercube algorithms: with applications to image processing and pattern recognition
Communication efficient matrix multiplication on hypercubes
SPAA '94 Proceedings of the sixth annual ACM symposium on Parallel algorithms and architectures
An approach to communication-efficient data redistribution
ICS '94 Proceedings of the 8th international conference on Supercomputing
IBM Journal of Research and Development
ScaLAPACK user's guide
Efficient Methods for Multi-Dimensional Array Redistribution
The Journal of Supercomputing
One-Sided Communication on Clusters with Myrinet
Cluster Computing
A Fast Scalable Universal Matrix Multiplication Algorithm on Distributed-Memory Concurrent Computers
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Protocols and Strategies for Optimizing Performance of Remote Memory Operations on Clusters
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Proceedings of the 11 IPPS/SPDP'99 Workshops Held in Conjunction with the 13th International Parallel Processing Symposium and 10th Symposium on Parallel and Distributed Processing
Toward data distribution independent parallel matrix multiplication
IPPS '95 Proceedings of the 9th International Symposium on Parallel Processing
High performance RDMA-based MPI implementation over InfiniBand
ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Mixed Mode Matrix Multiplication
CLUSTER '02 Proceedings of the IEEE International Conference on Cluster Computing
A Proposal for a Set of Parallel Basic Linear Algebra Subprograms
A Proposal for a Set of Parallel Basic Linear Algebra Subprograms
Parallel Matrix Distributions: Have we been doing it all wrong?
Parallel Matrix Distributions: Have we been doing it all wrong?
A cellular computer to implement the kalman filter algorithm
A cellular computer to implement the kalman filter algorithm
Optimizing Parallel Multiplication Operation for Rectangular and Transposed Matrices
ICPADS '04 Proceedings of the Parallel and Distributed Systems, Tenth International Conference
Optimal solution to matrix parenthesization problem employing parallel processing approach
EC'07 Proceedings of the 8th Conference on 8th WSEAS International Conference on Evolutionary Computing - Volume 8
Hi-index | 0.00 |
Regular distributions for storing dense matrices on parallel systems are not always used in practice. In many scientific applicati RUMMA) [1] to handle irregularly distributed matrices. Our approach relies on a distribution independent algorithm that provides dynamic load balancing by exploiting data locality and achieves performance as good as the traditional approach which relies on temporary arrays with regular distribution, data redistribution, and matrix multiplication for regular matrices to handle the irregular case. The proposed algorithm is memory-efficient because temporary matrices are not needed. This feature is critical for systems like the IBM Blue Gene/L that offer very limited amount of memory per node. The experimental results demonstrate very good performance across the range of matrix distributions and problem sizes motivated by real applications.