Efficient Algorithms for Block-Cyclic Redistribution of Arrays

Authors:
Young Won Lim;Prashanth B. Bhat;Viktor K. Prasanna
Affiliations:
-;-;-
Venue:
SPDP '96 Proceedings of the 8th IEEE Symposium on Parallel and Distributed Processing (SPDP '96)
Year:
1996

Citing 0
Cited 22

A Basic-Cycle Calculation Technique for Efficient Dynamic Data Redistribution

IEEE Transactions on Parallel and Distributed Systems
Efficient Methods for kr → r and r → kr Array Redistribution1

The Journal of Supercomputing
Efficient Algorithms for Block-Cyclic Array Redistribution Between Processor Sets

IEEE Transactions on Parallel and Distributed Systems
Efficient Methods for Multi-Dimensional Array Redistribution

The Journal of Supercomputing
A Generalized Basic-Cycle Calculation Method for Efficient Array Redistribution

IEEE Transactions on Parallel and Distributed Systems
Processor reordering algorithms toward efficient GEN_BLOCK redistribution

Proceedings of the 2001 ACM symposium on Applied computing
A Generalized Processor Mapping Technique for Array Redistribution

IEEE Transactions on Parallel and Distributed Systems
Efficient algorithms for block-cyclic array redistribution between processor sets

SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
A Framework for Efficient Data Redistribution on Distributed Memory Multicomputers

The Journal of Supercomputing
Scheduling GEN_BLOCK Array Redistribution

The Journal of Supercomputing
Portable and scalable algorithm for irregular all-to-all communication

Journal of Parallel and Distributed Computing
Message Encoding Techniques for Efficient Arrary Redistribution

ICPP '97 Proceedings of the international Conference on Parallel Processing
Efficient Algorithms for Multi-dimensional Block-Cyclic Redistribution of Arrays

ICPP '97 Proceedings of the international Conference on Parallel Processing
Efficient Method for kr-r and r-kr Arrary Redistribution

COMPSAC '97 Proceedings of the 21st International Computer Software and Applications Conference
Suboptimal Communication Schedule for GEN_BLOCK Redistribution (Best Student Paper Award: Honourable Mention)

VECPAR '00 Selected Papers and Invited Talks from the 4th International Conference on Vector and Parallel Processing
Optimizing Data Scheduling on Processor-In-Memory Arrays

IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
A Compressed Diagonals Remapping Technique for Dynamic Data Redistribution on Banded Sparse Matrix

The Journal of Supercomputing
A Divide-and-Conquer Algorithm for Irregular Redistribution in Parallelizing Compilers

The Journal of Supercomputing
Improving communication scheduling for array redistribution

Journal of Parallel and Distributed Computing
Optimizing Communications of Dynamic Data Redistribution on Symmetrical Matrices in Parallelizing Compilers

IEEE Transactions on Parallel and Distributed Systems
Message scheduling for array re-decomposition on distributed memory systems

Future Generation Computer Systems
Efficient multidimensional data redistribution for resizable parallel computations

ISPA'07 Proceedings of the 5th international conference on Parallel and Distributed Processing and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present new algorithmic techniques for a classical research problem, runtime redistribution of an array from one block-cyclic layout to another. Our methodology for reducing communication overheads is based on a generalized circulant matrix formalism. Using this formalism, we derive direct, indirect, and hybrid communication schedules for the cyclic redistribution problem when the block size changes by an integer factor K. We have also developed formulae to estimate the timing performance of each of these schedules for a given parallel machine and redistribution problem. In our indirect communication schedule, blocks are moved from a source processor to a destination processor through intermediate ``relay'' processors. This reduces the number of communication steps by an order of magnitude, in comparison with previous approaches. This algorithm performs cyclic(x) to cyclic(Kx) redistribution on P processors in {\lceil} {\log_2} K {\rceil} \!+\! 2 steps. Implementations of these algorithms on the Cray T3D and on the IBM SP-2 show superior performance over previous approaches. Since our algorithms are developed using MPI, they can be easily ported to different application environments. Our techniques can be used in the design of scalable redistribution libraries, in efficient implementations of the {\tt REDISTRIBUTE } directive of HPF, and in developing parallel algorithms for various HPC applications.