Generating communication sets of array assignment statements for block-cyclic distribution on distributed memory parallel computers

Authors:
PeiZong Lee;Wen-Yao Chen
Affiliations:
Institute of Information Science, Academia Sinica, Taipei, Taiwan, ROC;Institute of Information Science, Academia Sinica, Taipei, Taiwan, ROC
Venue:
Parallel Computing
Year:
2002

Citing 35
Cited 0

Compile-time generation of regular communications patterns

Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Compiling Fortran D for MIMD distributed-memory machines

Communications of the ACM
The high performance Fortran handbook

The high performance Fortran handbook
Compiling Fortran 90D/HPF for distributed memory MIMD computers

Journal of Parallel and Distributed Computing - Special issue on data parallel algorithms and programming
Generating communication for array statements: design, implementation, and evaluation

Journal of Parallel and Distributed Computing - Special issue on data parallel algorithms and programming
An approach to communication-efficient data redistribution

ICS '94 Proceedings of the 8th international conference on Supercomputing
Compilation techniques for block-cyclic distributions

ICS '94 Proceedings of the 8th international conference on Supercomputing
Generating local addresses and communication sets for data-parallel programs

Journal of Parallel and Distributed Computing
A linear-time algorithm for computing the memory access sequence in data-parallel programs

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
The Omega Library interface guide

The Omega Library interface guide
Optimization of array redistribution for distributed memory multicomputers

Parallel Computing
Techniques for compiling programs on distributed memory multicomputers

Parallel Computing
Processor Mapping Techniques Toward Efficient Data Redistribution

IEEE Transactions on Parallel and Distributed Systems
Efficient address generation for block-cyclic distributions

ICS '95 Proceedings of the 9th international conference on Supercomputing
Handling block-cyclic distributed arrays in Vienna Fortran 90

PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
An Implementation Framework for HPF Distributed Arrays on Message-Passing Parallel Computer Systems

IEEE Transactions on Parallel and Distributed Systems
Compiling array expressions for efficient execution on distributed-memory machines

Journal of Parallel and Distributed Computing
Efficient computation of address sequences in data parallel programs using closed forms for basis vectors

Journal of Parallel and Distributed Computing - Special issue on compilation techniques for distributed memory systems
Optimizations for efficient array redistribution on distributed memory multicomputers

Journal of Parallel and Distributed Computing - Special issue on compilation techniques for distributed memory systems
Efficient index set generation for compiling HPF array statements on distributed-memory machines

Journal of Parallel and Distributed Computing - Special issue on compilation techniques for distributed memory systems
A linear algebra framework for static High Performance Fortran code distribution

Scientific Programming - Special issue: High Performance Fortran comes of age
Efficient Algorithms for Data Distribution on Distributed Memory Parallel Computers

IEEE Transactions on Parallel and Distributed Systems
Scheduling Block-Cyclic Array Redistribution

IEEE Transactions on Parallel and Distributed Systems
A Basic-Cycle Calculation Technique for Efficient Dynamic Data Redistribution

IEEE Transactions on Parallel and Distributed Systems
Using integer sets for data-parallel program analysis and optimization

PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Communication Generation for Aligned and Cyclic(K) Distributions Using Integer Lattice

IEEE Transactions on Parallel and Distributed Systems
Runtime performance of parallel array assignment: an empirical study

Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
High Performance Compilers for Parallel Computing

High Performance Compilers for Parallel Computing
Compiling Global Name-Space Parallel Loops for Distributed Execution

IEEE Transactions on Parallel and Distributed Systems
Efficient Algorithms for Array Redistribution

IEEE Transactions on Parallel and Distributed Systems
Generalized Overlap Regions for Communication Optimization in Data-Parallel Programs

LCPC '96 Proceedings of the 9th International Workshop on Languages and Compilers for Parallel Computing
An Array Partitioning Analysis for Parallel Loop Distribution

Euro-Par '95 Proceedings of the First International Euro-Par Conference on Parallel Processing
State of the Art in Compiling HPF

The Data Parallel Programming Model: Foundations, HPF Realization, and Scientific Applications
Processing Array Statements and Procedure Interfaces in the PREPARE High Performance Fortran Compiler

CC '94 Proceedings of the 5th International Conference on Compiler Construction
Compiler Techniques for Determining Data Distribution and Generating Communication Sets on Distributed-Memory Machines

HICSS '96 Proceedings of the 29th Hawaii International Conference on System Sciences Volume 1: Software Technology and Architecture

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper is concerned with the design of efficient algorithms for generating global name-space communication sets based on execution of array assignment statements for arbitrary strides and block sizes on distributed-memory parallel computers. We will present a hybrid approach, which combines the advantages of the set-theoretic method and the integer lattice method for generating communication sets. When block sizes are extremely small or large, a cyclic-based or a row-wise set-theoretic method is used. For other cases when block sizes are medium, we propose a new integer lattice method, in which data in each local block are treated as a unit. The first virtual referenced element in each virtual referenced local block can be generated efficiently by using an integer lattice method, in which the left boundary of index domain in each processing element is extended for this purpose. Then, the physical referenced elements in each physical referenced local block can be determined by the intersection of two closed forms, whose result is also a closed form. Because the cost of generating indices for packing and unpacking messages at the sending and receiving ends may be expensive for certain cases, we also study the conventional communication model and the deposit communication model. As each of the proposed algorithms and the communication models has its special use for certain cases, we thus identify rules of thumb to decide the most suitable algorithm for dealing with general cases.