Generating local addresses and communication sets for data-parallel programs
Journal of Parallel and Distributed Computing
A linear-time algorithm for computing the memory access sequence in data-parallel programs
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
An Implementation Framework for HPF Distributed Arrays on Message-Passing Parallel Computer Systems
IEEE Transactions on Parallel and Distributed Systems
Journal of Parallel and Distributed Computing - Special issue on compilation techniques for distributed memory systems
A linear algebra framework for static High Performance Fortran code distribution
Scientific Programming - Special issue: High Performance Fortran comes of age
Communication generation for data-parallel languages
Communication generation for data-parallel languages
Algorithmic Redistribution Methods for Block-Cyclic Decompositions
IEEE Transactions on Parallel and Distributed Systems
Efficient index generation for compiling two-level mappings in data-parallel programs
Journal of Parallel and Distributed Computing
A Generalized Basic-Cycle Calculation Method for Efficient Array Redistribution
IEEE Transactions on Parallel and Distributed Systems
Fast Address Sequence Generation for Data-Parallel Programs Using Integer Lattices
LCPC '95 Proceedings of the 8th International Workshop on Languages and Compilers for Parallel Computing
Efficient Compilation of Array Statements for Private Memory Multicomputers
Efficient Compilation of Array Statements for Private Memory Multicomputers
Contention-free communication scheduling for group communication in data parallelism
OTM'07 Proceedings of the 2007 OTM confederated international conference on On the move to meaningful internet systems: CoopIS, DOA, ODBASE, GADA, and IS - Volume Part II
Hi-index | 0.00 |
How to generate local memory access sequence and communication sets efficiently is an important issue in compiling a data-parallel language into a single program multiple data (SPMD) code for distributed-memory machines. Many methods have been developed for generating local memory access sequence. In this paper, we focus on the problem of communication sets generation. The local block distance between two active elements with the same offset and destination (source) in a processor will be investigated. We develop an algorithm for the sending phase and receive-execute phase, respectively. Our algorithms do not need to compute send and receive patterns and lose no communication sets while there exist incomplete blocks that cannot constitute a send or receive pattern. Experimental results showed that our method outperforms other previous work.