Compilation techniques for block-cyclic distributions

Authors:
Seema Hiranandani;Ken Kennedy;John Mellor-Crummey;Ajay Sethi
Affiliations:
Department of Computer Science, Rice University, Houston, TX;Department of Computer Science, Rice University, Houston, TX;Department of Computer Science, Rice University, Houston, TX;Department of Computer Science, Rice University, Houston, TX
Venue:
ICS '94 Proceedings of the 8th international conference on Supercomputing
Year:
1994

Citing 11
Cited 25

Process decomposition through locality of reference

PLDI '89 Proceedings of the ACM SIGPLAN 1989 Conference on Programming language design and implementation
Run-time scheduling and execution of loops on message passing machines

Journal of Parallel and Distributed Computing - Special issue: algorithms for hypercube computers
Updating distributed variables in local computations

Concurrency: Practice and Experience
Analysis and transformation in the ParaScope editor

ICS '91 Proceedings of the 5th international conference on Supercomputing
Compiler optimizations for Fortran D on MIMD distributed-memory machines

Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Compile-time generation of regular communications patterns

Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Compiling Fortran D for MIMD distributed-memory machines

Communications of the ACM
Computer support for machine-independent parallel programming in Fortran D

Languages, compilers and run-time environments for distributed memory machines
Generating local addresses and communication sets for data-parallel programs

PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
The high performance Fortran handbook

The high performance Fortran handbook
An Implementation of Interprocedural Bounded Regular Section Analysis

IEEE Transactions on Parallel and Distributed Systems

A linear-time algorithm for computing the memory access sequence in data-parallel programs

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Efficient address generation for block-cyclic distributions

ICS '95 Proceedings of the 9th international conference on Supercomputing
An Implementation Framework for HPF Distributed Arrays on Message-Passing Parallel Computer Systems

IEEE Transactions on Parallel and Distributed Systems
Iteration space slicing and its application to communication optimization

ICS '97 Proceedings of the 11th international conference on Supercomputing
A Basic-Cycle Calculation Technique for Efficient Dynamic Data Redistribution

IEEE Transactions on Parallel and Distributed Systems
Efficient Methods for kr → r and r → kr Array Redistribution1

The Journal of Supercomputing
Communication Generation for Aligned and Cyclic(K) Distributions Using Integer Lattice

IEEE Transactions on Parallel and Distributed Systems
Efficient Algorithms for Block-Cyclic Array Redistribution Between Processor Sets

IEEE Transactions on Parallel and Distributed Systems
Efficient Methods for Multi-Dimensional Array Redistribution

The Journal of Supercomputing
A Generalized Basic-Cycle Calculation Method for Efficient Array Redistribution

IEEE Transactions on Parallel and Distributed Systems
Efficient Address Generation for Affine Subscripts in Data-Parallel Programs

The Journal of Supercomputing
A Generalized Processor Mapping Technique for Array Redistribution

IEEE Transactions on Parallel and Distributed Systems
Efficient algorithms for block-cyclic array redistribution between processor sets

SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Improving the performance of DSM systems via compiler involvement

Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Generating communication sets of array assignment statements for block-cyclic distribution on distributed memory parallel computers

Parallel Computing
Message Encoding Techniques for Efficient Arrary Redistribution

ICPP '97 Proceedings of the international Conference on Parallel Processing
Efficient Algorithms for Multi-dimensional Block-Cyclic Redistribution of Arrays

ICPP '97 Proceedings of the international Conference on Parallel Processing
Efficient Method for kr-r and r-kr Arrary Redistribution

COMPSAC '97 Proceedings of the 21st International Computer Software and Applications Conference
An Expression-Rewriting Framework to Generate Communication Sets for HPF Programs with Block-Cyclic Distribution

IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
A Compressed Diagonals Remapping Technique for Dynamic Data Redistribution on Banded Sparse Matrix

The Journal of Supercomputing
An efficient algorithm for communication set generation of data parallel programs with block-cyclic distribution

Parallel Computing
An Efficient Communication Scheduling Method for the Processor Mapping Technique Applied Data Redistribution

The Journal of Supercomputing
Optimizing Communications of Dynamic Data Redistribution on Symmetrical Matrices in Parallelizing Compilers

IEEE Transactions on Parallel and Distributed Systems
A flexible processor mapping technique toward data localization for block-cyclic data redistribution

The Journal of Supercomputing
A compressed diagonals remapping technique for dynamic data redistribution on banded sparse matrix

ISPA'03 Proceedings of the 2003 international conference on Parallel and distributed processing and applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Compilers for data-parallel languages such as Fortran D and High-Performance Fortran use data alignment and distribution specifications as the basis for translating programs for execution on MIMD distributed-memory machines. This paper describes techniques for generating efficient code for programs that use block-cyclic distributions. These techniques can be applied to programs with symbolic loop bounds, symbolic array dimensions, and loops with non-unit strides. We present algorithms for computing the data elements that need to be communicated among processors both for loops with unit and non-unit strides, a linear-time algorithm for computing the memory access sequence for loops with non-unit strides, and experimental results for a hand-compiled test case using block-cyclic distributions