Hybrid Algorithms for Complete Exchange in 2D Meshes

Authors:
N. S. Sundar;D. N. Jayasimha;Dhabaleswar K. Panda;P. Sadayappan
Affiliations:
Hewlet-Packard Co., Cupertino, CA;Intel Corp., Santa Clara, CA;Ohio State Univ., Columbus;Ohio State Univ., Columbus
Venue:
IEEE Transactions on Parallel and Distributed Systems
Year:
2001

Citing 16
Cited 3

Optimum Broadcasting and Personalized Communication in Hypercubes

IEEE Transactions on Computers
Optimal communication algorithms for hypercubes

Journal of Parallel and Distributed Computing
Optimal matrix transposition of bit reversal on hypercubes: all-to-personalized communication

Journal of Parallel and Distributed Computing
Complete exchange on the CM-5 and Touchstone Delta

The Journal of Supercomputing
PVM: Parallel virtual machine: a users' guide and tutorial for networked parallel computing

PVM: Parallel virtual machine: a users' guide and tutorial for networked parallel computing
Practical parallel algorithms for personalized communication and integer sorting

Practical parallel algorithms for personalized communication and integer sorting
Derandomizing algorithms for routing and sorting on meshes

SODA '94 Proceedings of the fifth annual ACM-SIAM symposium on Discrete algorithms
Introduction to process-oriented simulation and CSIM (tutorial session)

WSC' 90 Proceedings of the 22nd conference on Winter simulation
Multiphase Complete Exchange on Paragon, SP2, and CS-2

IEEE Parallel & Distributed Technology: Systems & Technology
Balanced Parallel Sort on Hypercube Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
Algorithms for All-to-All Personalized Exchange in 2D and 3D Tori

IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
How to Get Good Performance from the CM-5 Data Network

Proceedings of the 8th International Symposium on Parallel Processing
All-to-All Communication on Meshes with Wormhole Routing

Proceedings of the 8th International Symposium on Parallel Processing
CCL: A Portable and Tunable Collective Communication Library for Scalable Parallel Computers

Proceedings of the 8th International Symposium on Parallel Processing
Efficient Communication in the Folded Petersen Interconnection Network

PARLE '94 Proceedings of the 6th International PARLE Conference on Parallel Architectures and Languages Europe
Routing and Sorting on Meshes with Row and Column Buses

Routing and Sorting on Meshes with Row and Column Buses

All-port total exchange in cartesian product networks

Journal of Parallel and Distributed Computing
A message passing strategy for array redistributions in a torus network

The Journal of Supercomputing
A message combining approach for efficient array redistribution in non-all-to-all communication networks

International Journal of Computer Mathematics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Parallel algorithms for several common problems such as sorting and the FFT involve a personalized exchange of data among all the processors. Past approaches to doing complete exchange have taken one of two broad approaches: direct exchange or the indirect message-combining approaches. While combining approaches reduce the number of message startups, direct exchange minimizes the volume of data transmitted. This paper presents a family of hybrid algorithms for wormhole-routed 2D meshes that can effectively utilize the complementary strengths of these two approaches to complete exchange. The performance of hybrid algorithms using Cyclic Exchange and Scott's Direct Exchange are studied using analytical models, simulation, and implementation on a Cray T3D system. The results show that hybrids achieve lower completion times than either pure algorithm for a range of mesh sizes, data block sizes, and message startup costs. It is also demonstrated that barriers may be used to enhance performance by reducing message contention, whether or not the target system provides hardware support for barrier synchronization. The analytical models are shown useful in selecting the optimum hybrid for any given combination of system parameters (mesh size, message startup time, flit transfer time, and barrier cost) and the problem parameter (data block size).