Data distributions for sparse matrix vector multiplication
Parallel Computing
Efficient address generation for block-cyclic distributions
ICS '95 Proceedings of the 9th international conference on Supercomputing
Handling block-cyclic distributed arrays in Vienna Fortran 90
PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
Compile and Run-Time Support for the Parallelization of Sparse Matrix Updating Algorithms
The Journal of Supercomputing
Local Enumeration Techniques for Sparse Algorithms
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Hi-index | 0.00 |
Several methods have been proposed in the literature for the distribution of data on distributed memory machines, either oriented to dense or sparse structures. Many of the real applications, however, deal with both kind of data jointly. This paper presents techniques for integrating dense and sparse array accesses in a way that optimizes locality and further allows an efficient loop partitioning within a data-parallel compiler.Our approach is evaluated through an experimental survey with several compilers and parallel platforms. The results prove the benefits of the BRS sparse distribution when combined with CYCLIC in mixed algorithms and the poor efficiency achieved by well-known distribution schemes when sparse elements arise in the source code.