Efficient Algorithms for Block-Cyclic Array Redistribution Between Processor Sets

Authors:
Neungsoo Park;Viktor K. Prasanna;Cauligi S. Raghavendra
Affiliations:
Univ. of Southern California, Los Angeles;Univ. of Southern California, Los Angeles;Aerospace Corp., Los Angeles, CA
Venue:
IEEE Transactions on Parallel and Distributed Systems
Year:
1999

Citing 15
Cited 25

Combinatorial optimization: algorithms and complexity

Combinatorial optimization: algorithms and complexity
The high performance Fortran handbook

The high performance Fortran handbook
An approach to communication-efficient data redistribution

ICS '94 Proceedings of the 8th international conference on Supercomputing
Compilation techniques for block-cyclic distributions

ICS '94 Proceedings of the 8th international conference on Supercomputing
Efficient algorithms for all-to-all communications in multi-port message-passing systems

SPAA '94 Proceedings of the sixth annual ACM symposium on Parallel algorithms and architectures
ScaLAPACK user's guide

ScaLAPACK user's guide
Fast runtime block cyclic data redistribution on multiprocessors

Journal of Parallel and Distributed Computing
Scheduling Block-Cyclic Array Redistribution

IEEE Transactions on Parallel and Distributed Systems
A Basic-Cycle Calculation Technique for Efficient Dynamic Data Redistribution

IEEE Transactions on Parallel and Distributed Systems
Efficient Algorithms for Array Redistribution

IEEE Transactions on Parallel and Distributed Systems
Processor Mapping Techniques Toward Efficient Data Redistribution

Proceedings of the 8th International Symposium on Parallel Processing
Multi-phase array redistribution: modeling and evaluation

IPPS '95 Proceedings of the 9th International Symposium on Parallel Processing
Automatic generation of efficient array redistribution routines for distributed memory multicomputers

FRONTIERS '95 Proceedings of the Fifth Symposium on the Frontiers of Massively Parallel Computation (Frontiers'95)
Efficient Algorithms for Block-Cyclic Redistribution of Arrays

SPDP '96 Proceedings of the 8th IEEE Symposium on Parallel and Distributed Processing (SPDP '96)
Communication issues in heterogeneous embedded systems

WPDRTS '96 Proceedings of the 4th International Workshop on Parallel and Distributed Real-Time Systems

A Framework for Efficient Data Redistribution on Distributed Memory Multicomputers

The Journal of Supercomputing
QR factorization for shared memory and message passing

Parallel Computing
A Compressed Diagonals Remapping Technique for Dynamic Data Redistribution on Banded Sparse Matrix

The Journal of Supercomputing
A Divide-and-Conquer Algorithm for Irregular Redistribution in Parallelizing Compilers

The Journal of Supercomputing
Sparse Matrix Block-Cyclic Realignment on Distributed Memory Machines

The Journal of Supercomputing
Improving communication scheduling for array redistribution

Journal of Parallel and Distributed Computing
A pipeline technique for dynamic data transfer on a multiprocessor grid

International Journal of Parallel Programming
An Efficient Communication Scheduling Method for the Processor Mapping Technique Applied Data Redistribution

The Journal of Supercomputing
Messages Scheduling for Parallel Data Redistribution between Clusters

IEEE Transactions on Parallel and Distributed Systems
Optimizing Communications of Dynamic Data Redistribution on Symmetrical Matrices in Parallelizing Compilers

IEEE Transactions on Parallel and Distributed Systems
Scheduling contention-free irregular redistributions in parallelizing compilers

The Journal of Supercomputing
Scheduling Messages For Data Redistribution: An Experimental Study

International Journal of High Performance Computing Applications
A flexible processor mapping technique toward data localization for block-cyclic data redistribution

The Journal of Supercomputing
A message passing strategy for array redistributions in a torus network

The Journal of Supercomputing
A message combining approach for efficient array redistribution in non-all-to-all communication networks

International Journal of Computer Mathematics
Message scheduling for array re-decomposition on distributed memory systems

Future Generation Computer Systems
A compressed diagonals remapping technique for dynamic data redistribution on banded sparse matrix

ISPA'03 Proceedings of the 2003 international conference on Parallel and distributed processing and applications
Contention-free communication scheduling for group communication in data parallelism

OTM'07 Proceedings of the 2007 OTM confederated international conference on On the move to meaningful internet systems: CoopIS, DOA, ODBASE, GADA, and IS - Volume Part II
Optimizing scheduling stability for runtime data alignment

EUC'06 Proceedings of the 2006 international conference on Emerging Directions in Embedded and Ubiquitous Computing
Optimal processor mapping scheme for efficient communication of data realignment

ISPA'04 Proceedings of the Second international conference on Parallel and Distributed Processing and Applications
Efficient communication scheduling methods for irregular data redistribution in parallelizing compilers

PaCT'05 Proceedings of the 8th international conference on Parallel Computing Technologies
Irregular redistribution scheduling by partitioning messages

ACSAC'05 Proceedings of the 10th Asia-Pacific conference on Advances in Computer Systems Architecture
ISO: comprehensive techniques toward efficient gen_block redistribution with multidimensional arrays

PaCT'07 Proceedings of the 9th international conference on Parallel Computing Technologies
Efficient multidimensional data redistribution for resizable parallel computations

ISPA'07 Proceedings of the 5th international conference on Parallel and Distributed Processing and Applications
On the complexity of the max-edge-coloring problem with its variants

ESCAPE'07 Proceedings of the First international conference on Combinatorics, Algorithms, Probabilistic and Experimental Methodologies

Quantified Score

Hi-index	0.00

Visualization

Abstract

Run-time array redistribution is necessary to enhance the performance of parallel programs on distributed memory supercomputers. In this paper, we present an efficient algorithm for array redistribution from cyclic(x) on $P$ processors to cyclic(Kx) on $Q$ processors. The algorithm reduces the overall time for communication by considering the data transfer, communication schedule, and index computation costs. The proposed algorithm is based on a generalized circulant matrix formalism. Our algorithm generates a schedule that minimizes the number of communication steps and eliminates node contention in each communication step. The network bandwidth is fully utilized by ensuring that equal-sized messages are transferred in each communication step. Furthermore, the time to compute the schedule and the index sets is significantly smaller. It takes $O(max(P,Q))$ time and is less than 1 percent of the data transfer time. In comparison, the schedule computation time using the state-of-the-art scheme (which is based on the bipartite matching scheme) is 10 to 50 percent of the data transfer time for similar problem sizes. Therefore, our proposed algorithm is suitable for run-time array redistribution. To evaluate the performance of our scheme, we have implemented the algorithm using C and MPI on an IBM SP2. Results show that our algorithm performs better than the previous algorithms with respect to the total redistribution time, which includes the time for data transfer, schedule, and index computation.