A Partitioning Strategy for Nonuniform Problems on Multiprocessors
IEEE Transactions on Computers
ACM Transactions on Mathematical Software (TOMS)
Numerical recipes in Fortran 90 (2nd ed.): the art of parallel scientific computing
Numerical recipes in Fortran 90 (2nd ed.): the art of parallel scientific computing
Parallelization techniques for sparse matrix applications
Journal of Parallel and Distributed Computing - Special issue on compilation techniques for distributed memory systems
Matrix computations (3rd ed.)
Vienna-Fortran/HPF Extensions for Sparse and Irregular Problems and Their Compilation
IEEE Transactions on Parallel and Distributed Systems
Next-generation generic programming and its application to sparse matrix computations
Proceedings of the 14th international conference on Supercomputing
Data-parallel support for numerical irregular problems
Parallel Computing - Special Anniversary issue
Efficient Representation Scheme for Multidimensional Array Operations
IEEE Transactions on Computers
Compiling parallel code for sparse matrix applications
SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
Parallel Sparse Supports for Array Intrinsic Functions of Fortran 90
The Journal of Supercomputing
New data-parallel language features for sparse matrix computations
IPPS '95 Proceedings of the 9th International Symposium on Parallel Processing
Run-Time Optimization of Sparse Matrix-Vector Multiplication on SIMD Machines
PARLE '94 Proceedings of the 6th International PARLE Conference on Parallel Architectures and Languages Europe
Data Distribution Schemes of Sparse Arrays on Distributed Memory Multicomputers
ICPPW '02 Proceedings of the 2002 International Conference on Parallel Processing Workshops
The SPARAMAT Approach to Automatic Comprehension of Sparse Matrix Computations
IWPC '99 Proceedings of the 7th International Workshop on Program Comprehension
Caching-Efficient Multithreaded Fast Multiplication of Sparse Matrices
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
On Improving the Performance of Sparse Matrix-Vector Multiplication
HIPC '97 Proceedings of the Fourth International Conference on High-Performance Computing
IEEE Transactions on Parallel and Distributed Systems
IEEE Transactions on Computers
High Performance Fortran: Language Specification (PART II)
ACM SIGPLAN Fortran Forum - Special issue: high performance Fortran language specification, part 2
A flexible processor mapping technique toward data localization for block-cyclic data redistribution
The Journal of Supercomputing
A Two-Level Scheduling Strategy for optimising communications of data parallel programs in clusters
International Journal of Ad Hoc and Ubiquitous Computing
Message clustering technique towards efficient irregular data redistribution in clusters and grids
ICA3PP'10 Proceedings of the 10th international conference on Algorithms and Architectures for Parallel Processing - Volume Part I
Hi-index | 0.00 |
A data distribution scheme of sparse arrays on a distributed memory multicomputer, in general, is composed of three phases, data partition, data distribution, and data compression. To implement the data distribution scheme, many methods proposed in the literature first perform the data partition phase, then the data distribution phase, followed by the data compression phase. We called a data distribution scheme with this order as Send Followed Compress (SFC) scheme. In this paper, we propose two other data distribution schemes, Compress Followed Send (CFS) and Encoding-Decoding (ED), for sparse array distribution. In the CFS scheme, the data compression phase is performed before the data distribution phase. In the ED scheme, the data compression phase can be divided into two steps, encoding and decoding. The encoding step and the decoding step are performed before and after the data distribution phase, respectively. To evaluate the CFS and the ED schemes, we compare them with the SFC scheme. In the data partition phase, the row partition, the column partition, and the 2D mesh partition with/without load-balancing methods are used for these three schemes. In the compression phase, the CRS/CCS methods are used to compress sparse local arrays for the SFC and the CFS schemes while the encoding/decoding step is used for the ED scheme. Both theoretical analysis and experimental tests were conducted. In the theoretical analysis, we analyze the SFC, the CFS, and the ED schemes in terms of the data distribution time and the data compression time. In experimental tests, we implemented these three schemes on an IBM SP2 parallel machine. From the experimental results, for most of test cases, the CFS and the ED schemes outperform the SFC scheme. For the CFS and the ED schemes, the ED scheme outperforms the CFS scheme for all test cases.