Compile-time generation of regular communications patterns
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Compiling Fortran D for MIMD distributed-memory machines
Communications of the ACM
The high performance Fortran handbook
The high performance Fortran handbook
Compiling Fortran 90D/HPF for distributed memory MIMD computers
Journal of Parallel and Distributed Computing - Special issue on data parallel algorithms and programming
Generating communication for array statements: design, implementation, and evaluation
Journal of Parallel and Distributed Computing - Special issue on data parallel algorithms and programming
An approach to communication-efficient data redistribution
ICS '94 Proceedings of the 8th international conference on Supercomputing
Compilation techniques for block-cyclic distributions
ICS '94 Proceedings of the 8th international conference on Supercomputing
Generating local addresses and communication sets for data-parallel programs
Journal of Parallel and Distributed Computing
A linear-time algorithm for computing the memory access sequence in data-parallel programs
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
The Omega Library interface guide
The Omega Library interface guide
Optimization of array redistribution for distributed memory multicomputers
Parallel Computing
Techniques for compiling programs on distributed memory multicomputers
Parallel Computing
Processor Mapping Techniques Toward Efficient Data Redistribution
IEEE Transactions on Parallel and Distributed Systems
Efficient address generation for block-cyclic distributions
ICS '95 Proceedings of the 9th international conference on Supercomputing
Handling block-cyclic distributed arrays in Vienna Fortran 90
PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
An Implementation Framework for HPF Distributed Arrays on Message-Passing Parallel Computer Systems
IEEE Transactions on Parallel and Distributed Systems
Compiling array expressions for efficient execution on distributed-memory machines
Journal of Parallel and Distributed Computing
Journal of Parallel and Distributed Computing - Special issue on compilation techniques for distributed memory systems
Optimizations for efficient array redistribution on distributed memory multicomputers
Journal of Parallel and Distributed Computing - Special issue on compilation techniques for distributed memory systems
Efficient index set generation for compiling HPF array statements on distributed-memory machines
Journal of Parallel and Distributed Computing - Special issue on compilation techniques for distributed memory systems
A linear algebra framework for static High Performance Fortran code distribution
Scientific Programming - Special issue: High Performance Fortran comes of age
Efficient Algorithms for Data Distribution on Distributed Memory Parallel Computers
IEEE Transactions on Parallel and Distributed Systems
Scheduling Block-Cyclic Array Redistribution
IEEE Transactions on Parallel and Distributed Systems
A Basic-Cycle Calculation Technique for Efficient Dynamic Data Redistribution
IEEE Transactions on Parallel and Distributed Systems
Using integer sets for data-parallel program analysis and optimization
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Communication Generation for Aligned and Cyclic(K) Distributions Using Integer Lattice
IEEE Transactions on Parallel and Distributed Systems
Runtime performance of parallel array assignment: an empirical study
Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
High Performance Compilers for Parallel Computing
High Performance Compilers for Parallel Computing
Compiling Global Name-Space Parallel Loops for Distributed Execution
IEEE Transactions on Parallel and Distributed Systems
Efficient Algorithms for Array Redistribution
IEEE Transactions on Parallel and Distributed Systems
Generalized Overlap Regions for Communication Optimization in Data-Parallel Programs
LCPC '96 Proceedings of the 9th International Workshop on Languages and Compilers for Parallel Computing
An Array Partitioning Analysis for Parallel Loop Distribution
Euro-Par '95 Proceedings of the First International Euro-Par Conference on Parallel Processing
State of the Art in Compiling HPF
The Data Parallel Programming Model: Foundations, HPF Realization, and Scientific Applications
CC '94 Proceedings of the 5th International Conference on Compiler Construction
HICSS '96 Proceedings of the 29th Hawaii International Conference on System Sciences Volume 1: Software Technology and Architecture
Hi-index | 0.00 |
This paper is concerned with the design of efficient algorithms for generating global name-space communication sets based on execution of array assignment statements for arbitrary strides and block sizes on distributed-memory parallel computers. We will present a hybrid approach, which combines the advantages of the set-theoretic method and the integer lattice method for generating communication sets. When block sizes are extremely small or large, a cyclic-based or a row-wise set-theoretic method is used. For other cases when block sizes are medium, we propose a new integer lattice method, in which data in each local block are treated as a unit. The first virtual referenced element in each virtual referenced local block can be generated efficiently by using an integer lattice method, in which the left boundary of index domain in each processing element is extended for this purpose. Then, the physical referenced elements in each physical referenced local block can be determined by the intersection of two closed forms, whose result is also a closed form. Because the cost of generating indices for packing and unpacking messages at the sending and receiving ends may be expensive for certain cases, we also study the conventional communication model and the deposit communication model. As each of the proposed algorithms and the communication models has its special use for certain cases, we thus identify rules of thumb to decide the most suitable algorithm for dealing with general cases.