Compile-time generation of regular communications patterns
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Non-unimodular transformations of nested loops
Proceedings of the 1992 ACM/IEEE conference on Supercomputing
The high performance Fortran handbook
The high performance Fortran handbook
Generating communication for array statements: design, implementation, and evaluation
Journal of Parallel and Distributed Computing - Special issue on data parallel algorithms and programming
Generating local addresses and communication sets for data-parallel programs
Journal of Parallel and Distributed Computing
A linear-time algorithm for computing the memory access sequence in data-parallel programs
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Beyond unimodular transformations
The Journal of Supercomputing
Compile-time and run-time strategies for array statement execution on distributed-memory machines
Compile-time and run-time strategies for array statement execution on distributed-memory machines
Efficient address generation for block-cyclic distributions
ICS '95 Proceedings of the 9th international conference on Supercomputing
Handling block-cyclic distributed arrays in Vienna Fortran 90
PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
An Implementation Framework for HPF Distributed Arrays on Message-Passing Parallel Computer Systems
IEEE Transactions on Parallel and Distributed Systems
Compiling array expressions for efficient execution on distributed-memory machines
Journal of Parallel and Distributed Computing
Journal of Parallel and Distributed Computing - Special issue on compilation techniques for distributed memory systems
A linear algebra framework for static High Performance Fortran code distribution
Scientific Programming - Special issue: High Performance Fortran comes of age
Runtime performance of parallel array assignment: an empirical study
Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Data Organization in Parallel Computers
Data Organization in Parallel Computers
Efficient Algorithms for Array Redistribution
IEEE Transactions on Parallel and Distributed Systems
Multi-phase array redistribution: modeling and evaluation
IPPS '95 Proceedings of the 9th International Symposium on Parallel Processing
Generalized Overlap Regions for Communication Optimization in Data-Parallel Programs
LCPC '96 Proceedings of the 9th International Workshop on Languages and Compilers for Parallel Computing
Code Generation for Complex Subscripts in Data-Parallel Programs
LCPC '97 Proceedings of the 10th International Workshop on Languages and Compilers for Parallel Computing
Fast Address Sequence Generation for Data-Parallel Programs Using Integer Lattices
LCPC '95 Proceedings of the 8th International Workshop on Languages and Compilers for Parallel Computing
State of the Art in Compiling HPF
The Data Parallel Programming Model: Foundations, HPF Realization, and Scientific Applications
Efficient Address Sequence Generation for Two-Level Mappings in High Performance Fortran
HIPC '98 Proceedings of the Fifth International Conference on High Performance Computing
Efficient Compilation of Array Statements for Private Memory Multicomputers
Efficient Compilation of Array Statements for Private Memory Multicomputers
Scientific Programming
Hi-index | 0.00 |
In data-parallel languages such as High Performance Fortran and Fortran D, arrays are mapped to processors through a two-step process involving alignment followed by distribution. A compiler that generates code for each processor has to compute the sequence of local memory addresses accessed by each processor and the sequence of sends and receives for a given processor to access non-local data. In this chapter, we present a novel approach to the address sequence generation problem based on integer lattices. When the alignment stride is one, the mapping is called a one-level mapping. In the case of one-level mapping, the set of elements referenced can be generated by integer linear combinations of basis vectors. Using the basis vectors we derive a loop nest that enumerates the addresses, which are points in the lattice generated by the basis vectors. The basis determination and lattice enumeration algorithms are linear time algorithms. For the two-level mapping (non-unit alignment stride) problem, we present a fast novel solution that incurs zero memory wastage and little overhead, and relies on two applications of the solution of the one-level mapping problem followed by a fix-up phase. Experimental results demonstrate that our solutions to the address generation problem are significantly faster than other solutions to this problem. In addition, we present a brief overview of our work on related problems such as communication generation, basis vector derivation, code generation for complex subscripts and array redistribution.