Exploiting task and data parallelism on a multicomputer
PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
An approach to communication-efficient data redistribution
ICS '94 Proceedings of the 8th international conference on Supercomputing
Handling block-cyclic distributed arrays in Vienna Fortran 90
PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
Fast message assembly using compact address relations
Proceedings of the 1996 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
An Implementation Framework for HPF Distributed Arrays on Message-Passing Parallel Computer Systems
IEEE Transactions on Parallel and Distributed Systems
Runtime performance of parallel array assignment: an empirical study
Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Integer lattice based methods for local address generation for block-cyclic distributions
Compiler optimizations for scalable parallel systems
IEEE Transactions on Parallel and Distributed Systems
Generating efficient local memory access sequences for coupled subscripts in data-parallel programs
Information Sciences—Informatics and Computer Science: An International Journal
Efficient communication sets generation for block-cyclic distribution on distributed-memory machines
Journal of Systems Architecture: the EUROMICRO Journal
Code composition as an implementation language for compilers
DSL'97 Proceedings of the Conference on Domain-Specific Languages on Conference on Domain-Specific Languages (DSL), 1997
Hi-index | 0.00 |
One of the core constructs of High Performance Fortran (HPF) is the array-slice assignment statement, combined with the rich choice of data distribution options available to the programmer. On a private memory multicomputer, the HPF compiler writer faces the difficult task of automatically generating the necessary communication for assignment statements involving arrays with arbitrary block-cyclic data distributions. In this paper we present a framework for representing array slices and block-cyclic distributions, and we derive efficient algorithms for sending and receiving the necessary data for array-slice assignment statements. The algorithms include a memory-efficient method of managing the layout of the distributed arrays in each processor''s local memory. We also provide a means of converting the user''s TEMPLATE, ALIGN, and DISTRIBUTE statements into a convenient ''array ownership descriptor''. In addition, we present several optimizations for common distributions and easily-recognized communication patterns. The work presented makes minimal assumptions regarding the processor architecture, the communication architecture, or the underlying language being compiled.