Scheduling Block-Cyclic Array Redistribution

Authors:
Frédédéric Desprez;Cyril Randriamaro;Jack Dongarra;Antonie Petitet;Yves Robert
Affiliations:
Ecole Normale Supérieure de Lyon, Lyon, France;Ecole Normale Supérieure de Lyon, Lyon, France;Univ. of Tennesee, Knoxville and Oak Ridge National Lab, Oak ridge, TN;Univ. of Tennesee, Knoxville;Univ. of Tennesee, Knoxville
Venue:
IEEE Transactions on Parallel and Distributed Systems
Year:
1998

Citing 18
Cited 33

The high performance Fortran handbook

The high performance Fortran handbook
Generating communication for array statements: design, implementation, and evaluation

Journal of Parallel and Distributed Computing - Special issue on data parallel algorithms and programming
Generating local addresses and communication sets for data-parallel programs

Journal of Parallel and Distributed Computing
A linear-time algorithm for computing the memory access sequence in data-parallel programs

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Software libraries for linear algebra computations on high performance computers

SIAM Review
Optimization of array redistribution for distributed memory multicomputers

Parallel Computing
Processor Mapping Techniques Toward Efficient Data Redistribution

IEEE Transactions on Parallel and Distributed Systems
Efficient address generation for block-cyclic distributions

ICS '95 Proceedings of the 9th international conference on Supercomputing
Handbook of combinatorics (vol. 1)

Handbook of combinatorics (vol. 1)
An Implementation Framework for HPF Distributed Arrays on Message-Passing Parallel Computer Systems

IEEE Transactions on Parallel and Distributed Systems
Compiling array expressions for efficient execution on distributed-memory machines

Journal of Parallel and Distributed Computing
A linear algebra framework for static High Performance Fortran code distribution

Scientific Programming - Special issue: High Performance Fortran comes of age
Runtime performance of parallel array assignment: an empirical study

Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
MPI: The Complete Reference

MPI: The Complete Reference
Fast Address Sequence Generation for Data-Parallel Programs Using Integer Lattices

LCPC '95 Proceedings of the 8th International Workshop on Languages and Compilers for Parallel Computing
Efficient Block Cyclic Data Redistribution

Euro-Par '96 Proceedings of the Second International Euro-Par Conference on Parallel Processing - Volume I
LAPACK Working Note 95: ScaLAPACK: A Portable Linear Algebra Library for Distributed Memory Computers -- Design Issues and Performance

LAPACK Working Note 95: ScaLAPACK: A Portable Linear Algebra Library for Distributed Memory Computers -- Design Issues and Performance
Algorithmic redistribution methods for block cyclic decompositions

Algorithmic redistribution methods for block cyclic decompositions

Algorithmic Redistribution Methods for Block-Cyclic Decompositions

IEEE Transactions on Parallel and Distributed Systems
Efficient Algorithms for Block-Cyclic Array Redistribution Between Processor Sets

IEEE Transactions on Parallel and Distributed Systems
Efficient Methods for Multi-Dimensional Array Redistribution

The Journal of Supercomputing
Processor reordering algorithms toward efficient GEN_BLOCK redistribution

Proceedings of the 2001 ACM symposium on Applied computing
A Generalized Processor Mapping Technique for Array Redistribution

IEEE Transactions on Parallel and Distributed Systems
A Framework for Efficient Data Redistribution on Distributed Memory Multicomputers

The Journal of Supercomputing
Block-cyclic redistribution over heterogeneous networks

Cluster Computing
Scheduling GEN_BLOCK Array Redistribution

The Journal of Supercomputing
Generating communication sets of array assignment statements for block-cyclic distribution on distributed memory parallel computers

Parallel Computing
Improving MPI-I/O Performance on PVFS

Euro-Par '01 Proceedings of the 7th International Euro-Par Conference Manchester on Parallel Processing
Suboptimal Communication Schedule for GEN_BLOCK Redistribution (Best Student Paper Award: Honourable Mention)

VECPAR '00 Selected Papers and Invited Talks from the 4th International Conference on Vector and Parallel Processing
More on Scheduling Block-Cyclic Array Redistribution

LCR '98 Selected Papers from the 4th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
A Compressed Diagonals Remapping Technique for Dynamic Data Redistribution on Banded Sparse Matrix

The Journal of Supercomputing
Sparse Matrix Block-Cyclic Realignment on Distributed Memory Machines

The Journal of Supercomputing
A pipeline technique for dynamic data transfer on a multiprocessor grid

International Journal of Parallel Programming
An Efficient Communication Scheduling Method for the Processor Mapping Technique Applied Data Redistribution

The Journal of Supercomputing
Messages Scheduling for Parallel Data Redistribution between Clusters

IEEE Transactions on Parallel and Distributed Systems
Optimizing Communications of Dynamic Data Redistribution on Symmetrical Matrices in Parallelizing Compilers

IEEE Transactions on Parallel and Distributed Systems
Scheduling contention-free irregular redistributions in parallelizing compilers

The Journal of Supercomputing
Scheduling Messages For Data Redistribution: An Experimental Study

International Journal of High Performance Computing Applications
A flexible processor mapping technique toward data localization for block-cyclic data redistribution

The Journal of Supercomputing
A message passing strategy for array redistributions in a torus network

The Journal of Supercomputing
A message combining approach for efficient array redistribution in non-all-to-all communication networks

International Journal of Computer Mathematics
Message scheduling for array re-decomposition on distributed memory systems

Future Generation Computer Systems
A compressed diagonals remapping technique for dynamic data redistribution on banded sparse matrix

ISPA'03 Proceedings of the 2003 international conference on Parallel and distributed processing and applications
Contention-free communication scheduling for group communication in data parallelism

OTM'07 Proceedings of the 2007 OTM confederated international conference on On the move to meaningful internet systems: CoopIS, DOA, ODBASE, GADA, and IS - Volume Part II
A Two-Level Scheduling Strategy for optimising communications of data parallel programs in clusters

International Journal of Ad Hoc and Ubiquitous Computing
Optimal processor mapping scheme for efficient communication of data realignment

ISPA'04 Proceedings of the Second international conference on Parallel and Distributed Processing and Applications
Message clustering technique towards efficient irregular data redistribution in clusters and grids

ICA3PP'10 Proceedings of the 10th international conference on Algorithms and Architectures for Parallel Processing - Volume Part I
Efficient communication scheduling methods for irregular data redistribution in parallelizing compilers

PaCT'05 Proceedings of the 8th international conference on Parallel Computing Technologies
Irregular redistribution scheduling by partitioning messages

ACSAC'05 Proceedings of the 10th Asia-Pacific conference on Advances in Computer Systems Architecture
ISO: comprehensive techniques toward efficient gen_block redistribution with multidimensional arrays

PaCT'07 Proceedings of the 9th international conference on Parallel Computing Technologies
On the complexity of the max-edge-coloring problem with its variants

ESCAPE'07 Proceedings of the First international conference on Combinatorics, Algorithms, Probabilistic and Experimental Methodologies

Quantified Score

Hi-index	0.00

Visualization

Abstract

This article is devoted to the run-time redistribution of one-dimensional arrays that are distributed in a block-cyclic fashion over a processor grid. While previous studies have concentrated on efficiently generating the communication messages to be exchanged by the processors involved in the redistribution, we focus on the scheduling of those messages: how to organize the message exchanges into "structured" communication steps that minimize contention. We build upon results of Walker and Otto, who solved a particular instance of the problem, and we derive an optimal scheduling for the most general case, namely, moving from a CYCLIC(r) distribution on a P-processor grid to a CYCLIC(s) distribution on a Q-processor grid, for arbitrary values of the redistribution parameters P, Q, r, and s.