Optimum Broadcasting and Personalized Communication in Hypercubes
IEEE Transactions on Computers
Optimal communication algorithms for hypercubes
Journal of Parallel and Distributed Computing
Optimal matrix transposition of bit reversal on hypercubes: all-to-personalized communication
Journal of Parallel and Distributed Computing
Complete exchange on the CM-5 and Touchstone Delta
The Journal of Supercomputing
PVM: Parallel virtual machine: a users' guide and tutorial for networked parallel computing
PVM: Parallel virtual machine: a users' guide and tutorial for networked parallel computing
Practical parallel algorithms for personalized communication and integer sorting
Practical parallel algorithms for personalized communication and integer sorting
Derandomizing algorithms for routing and sorting on meshes
SODA '94 Proceedings of the fifth annual ACM-SIAM symposium on Discrete algorithms
Introduction to process-oriented simulation and CSIM (tutorial session)
WSC' 90 Proceedings of the 22nd conference on Winter simulation
Multiphase Complete Exchange on Paragon, SP2, and CS-2
IEEE Parallel & Distributed Technology: Systems & Technology
Balanced Parallel Sort on Hypercube Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
Algorithms for All-to-All Personalized Exchange in 2D and 3D Tori
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
How to Get Good Performance from the CM-5 Data Network
Proceedings of the 8th International Symposium on Parallel Processing
All-to-All Communication on Meshes with Wormhole Routing
Proceedings of the 8th International Symposium on Parallel Processing
CCL: A Portable and Tunable Collective Communication Library for Scalable Parallel Computers
Proceedings of the 8th International Symposium on Parallel Processing
Efficient Communication in the Folded Petersen Interconnection Network
PARLE '94 Proceedings of the 6th International PARLE Conference on Parallel Architectures and Languages Europe
Routing and Sorting on Meshes with Row and Column Buses
Routing and Sorting on Meshes with Row and Column Buses
All-port total exchange in cartesian product networks
Journal of Parallel and Distributed Computing
A message passing strategy for array redistributions in a torus network
The Journal of Supercomputing
International Journal of Computer Mathematics
Hi-index | 0.00 |
Parallel algorithms for several common problems such as sorting and the FFT involve a personalized exchange of data among all the processors. Past approaches to doing complete exchange have taken one of two broad approaches: direct exchange or the indirect message-combining approaches. While combining approaches reduce the number of message startups, direct exchange minimizes the volume of data transmitted. This paper presents a family of hybrid algorithms for wormhole-routed 2D meshes that can effectively utilize the complementary strengths of these two approaches to complete exchange. The performance of hybrid algorithms using Cyclic Exchange and Scott's Direct Exchange are studied using analytical models, simulation, and implementation on a Cray T3D system. The results show that hybrids achieve lower completion times than either pure algorithm for a range of mesh sizes, data block sizes, and message startup costs. It is also demonstrated that barriers may be used to enhance performance by reducing message contention, whether or not the target system provides hardware support for barrier synchronization. The analytical models are shown useful in selecting the optimum hybrid for any given combination of system parameters (mesh size, message startup time, flit transfer time, and barrier cost) and the problem parameter (data block size).