Efficient Methods for Multi-Dimensional Array Redistribution

Authors:
Ching-Hsien Hsu;Yeh-Ching Chung;Chyi-Ren Dow
Affiliations:
Department of Information Engineering, Feng Chia University, Taichung, Taiwan 407, ROC chhsu@pine.iecs.fcu.edu.tw;Department of Information Engineering, Feng Chia University, Taichung, Taiwan 407, ROC ychung@pine.iecs.fcu.edu.tw;Department of Information Engineering, Feng Chia University, Taichung, Taiwan 407, ROC
Venue:
The Journal of Supercomputing
Year:
2000

Citing 26
Cited 5

Computational frameworks for the fast Fourier transform

Computational frameworks for the fast Fourier transform
Issues in scalable library design for massively parallel computers

Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Dynamic data distributions in Vienna Fortran

Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Generating communication for array statements: design, implementation, and evaluation

Journal of Parallel and Distributed Computing - Special issue on data parallel algorithms and programming
An approach to communication-efficient data redistribution

ICS '94 Proceedings of the 8th international conference on Supercomputing
Compilation techniques for block-cyclic distributions

ICS '94 Proceedings of the 8th international conference on Supercomputing
Generating local addresses and communication sets for data-parallel programs

Journal of Parallel and Distributed Computing
Optimization of array redistribution for distributed memory multicomputers

Parallel Computing
Processor Mapping Techniques Toward Efficient Data Redistribution

IEEE Transactions on Parallel and Distributed Systems
Efficient address generation for block-cyclic distributions

ICS '95 Proceedings of the 9th international conference on Supercomputing
Handling block-cyclic distributed arrays in Vienna Fortran 90

PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
Compiling array expressions for efficient execution on distributed-memory machines

Journal of Parallel and Distributed Computing
Efficient computation of address sequences in data parallel programs using closed forms for basis vectors

Journal of Parallel and Distributed Computing - Special issue on compilation techniques for distributed memory systems
Optimizations for efficient array redistribution on distributed memory multicomputers

Journal of Parallel and Distributed Computing - Special issue on compilation techniques for distributed memory systems
Efficient index set generation for compiling HPF array statements on distributed-memory machines

Journal of Parallel and Distributed Computing - Special issue on compilation techniques for distributed memory systems
Fast runtime block cyclic data redistribution on multiprocessors

Journal of Parallel and Distributed Computing
Scheduling Block-Cyclic Array Redistribution

IEEE Transactions on Parallel and Distributed Systems
A Basic-Cycle Calculation Technique for Efficient Dynamic Data Redistribution

IEEE Transactions on Parallel and Distributed Systems
Efficient Methods for kr → r and r → kr Array Redistribution1

The Journal of Supercomputing
Efficient Algorithms for Array Redistribution

IEEE Transactions on Parallel and Distributed Systems
Efficient Algorithms for Multi-dimensional Block-Cyclic Redistribution of Arrays

ICPP '97 Proceedings of the international Conference on Parallel Processing
Multi-phase array redistribution: modeling and evaluation

IPPS '95 Proceedings of the 9th International Symposium on Parallel Processing
A New Approach to Array Redistribution: Strip Mining Redistribution

PARLE '94 Proceedings of the 6th International PARLE Conference on Parallel Architectures and Languages Europe
Automatic generation of efficient array redistribution routines for distributed memory multicomputers

FRONTIERS '95 Proceedings of the Fifth Symposium on the Frontiers of Massively Parallel Computation (Frontiers'95)
Compiler Techniques for Determining Data Distribution and Generating Communication Sets on Distributed-Memory Machines

HICSS '96 Proceedings of the 29th Hawaii International Conference on System Sciences Volume 1: Software Technology and Architecture
Efficient Algorithms for Block-Cyclic Redistribution of Arrays

SPDP '96 Proceedings of the 8th IEEE Symposium on Parallel and Distributed Processing (SPDP '96)

A Compressed Diagonals Remapping Technique for Dynamic Data Redistribution on Banded Sparse Matrix

The Journal of Supercomputing
Sparse Matrix Block-Cyclic Realignment on Distributed Memory Machines

The Journal of Supercomputing
Memory efficient parallel matrix multiplication operation for irregular problems

Proceedings of the 3rd conference on Computing frontiers
Optimizing Communications of Dynamic Data Redistribution on Symmetrical Matrices in Parallelizing Compilers

IEEE Transactions on Parallel and Distributed Systems
A compressed diagonals remapping technique for dynamic data redistribution on banded sparse matrix

ISPA'03 Proceedings of the 2003 international conference on Parallel and distributed processing and applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

In many scientific applications, array redistribution is usually required to enhance data locality and reduce remote memory access on distributed memory multicomputers. Since the redistribution is performed at run-time, there is a performance tradeoff between the efficiency of the new data decomposition for a subsequent phase of an algorithm and the cost of redistributing data among processors. In this paper, we present efficient methods for multi-dimensional array redistribution. Based on the previous work, the basic-cycle calculation technique, we present a basic-block calculation (BBC) and a complete-dimension calculation (CDC) techniques. We also developed a theoretical model to analyze the computation costs of these two techniques. The theoretical model shows that the BBC method has smaller indexing costs and performs well for the redistribution with small array size. The CDC method has smaller packing/unpacking costs and performs well when array size is large. When implemented these two techniques on an IBM SP2 parallel machine along with the PITFALLS method and the Prylli's method, the experimental results show that the BBC method has the smallest execution time of these four algorithms when the array size is small. The CDC method has the smallest execution time of these four algorithms when the array size is large.