Integer lattice based methods for local address generation for block-cyclic distributions

Authors:
J. Ramanujam
Affiliations:
-
Venue:
Compiler optimizations for scalable parallel systems
Year:
2001

Citing 26
Cited 0

Compile-time generation of regular communications patterns

Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Non-unimodular transformations of nested loops

Proceedings of the 1992 ACM/IEEE conference on Supercomputing
The high performance Fortran handbook

The high performance Fortran handbook
Generating communication for array statements: design, implementation, and evaluation

Journal of Parallel and Distributed Computing - Special issue on data parallel algorithms and programming
Generating local addresses and communication sets for data-parallel programs

Journal of Parallel and Distributed Computing
A linear-time algorithm for computing the memory access sequence in data-parallel programs

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Beyond unimodular transformations

The Journal of Supercomputing
Compile-time and run-time strategies for array statement execution on distributed-memory machines

Compile-time and run-time strategies for array statement execution on distributed-memory machines
Efficient address generation for block-cyclic distributions

ICS '95 Proceedings of the 9th international conference on Supercomputing
Handling block-cyclic distributed arrays in Vienna Fortran 90

PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
An Implementation Framework for HPF Distributed Arrays on Message-Passing Parallel Computer Systems

IEEE Transactions on Parallel and Distributed Systems
Compiling array expressions for efficient execution on distributed-memory machines

Journal of Parallel and Distributed Computing
Efficient computation of address sequences in data parallel programs using closed forms for basis vectors

Journal of Parallel and Distributed Computing - Special issue on compilation techniques for distributed memory systems
A linear algebra framework for static High Performance Fortran code distribution

Scientific Programming - Special issue: High Performance Fortran comes of age
Runtime performance of parallel array assignment: an empirical study

Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Data Organization in Parallel Computers

Data Organization in Parallel Computers
The Paradigm Compiler for Distributed-Memory Multicomputers

Computer
Efficient Algorithms for Array Redistribution

IEEE Transactions on Parallel and Distributed Systems
Multi-phase array redistribution: modeling and evaluation

IPPS '95 Proceedings of the 9th International Symposium on Parallel Processing
Generalized Overlap Regions for Communication Optimization in Data-Parallel Programs

LCPC '96 Proceedings of the 9th International Workshop on Languages and Compilers for Parallel Computing
Code Generation for Complex Subscripts in Data-Parallel Programs

LCPC '97 Proceedings of the 10th International Workshop on Languages and Compilers for Parallel Computing
Fast Address Sequence Generation for Data-Parallel Programs Using Integer Lattices

LCPC '95 Proceedings of the 8th International Workshop on Languages and Compilers for Parallel Computing
State of the Art in Compiling HPF

The Data Parallel Programming Model: Foundations, HPF Realization, and Scientific Applications
Efficient Address Sequence Generation for Two-Level Mappings in High Performance Fortran

HIPC '98 Proceedings of the Fifth International Conference on High Performance Computing
Efficient Compilation of Array Statements for Private Memory Multicomputers

Efficient Compilation of Array Statements for Private Memory Multicomputers
Programming in Vienna Fortran

Scientific Programming

Quantified Score

Hi-index	0.00

Visualization

Abstract

In data-parallel languages such as High Performance Fortran and Fortran D, arrays are mapped to processors through a two-step process involving alignment followed by distribution. A compiler that generates code for each processor has to compute the sequence of local memory addresses accessed by each processor and the sequence of sends and receives for a given processor to access non-local data. In this chapter, we present a novel approach to the address sequence generation problem based on integer lattices. When the alignment stride is one, the mapping is called a one-level mapping. In the case of one-level mapping, the set of elements referenced can be generated by integer linear combinations of basis vectors. Using the basis vectors we derive a loop nest that enumerates the addresses, which are points in the lattice generated by the basis vectors. The basis determination and lattice enumeration algorithms are linear time algorithms. For the two-level mapping (non-unit alignment stride) problem, we present a fast novel solution that incurs zero memory wastage and little overhead, and relies on two applications of the solution of the one-level mapping problem followed by a fix-up phase. Experimental results demonstrate that our solutions to the address generation problem are significantly faster than other solutions to this problem. In addition, we present a brief overview of our work on related problems such as communication generation, basis vector derivation, code generation for complex subscripts and array redistribution.