Multiphase Complete Exchange on Paragon, SP2, and CS-2

Authors:
Shahid H. Bokhari
Affiliations:
-
Venue:
IEEE Parallel & Distributed Technology: Systems & Technology
Year:
1996

Citing 9
Cited 14

Fat-trees: universal networks for hardware-efficient supercomputing

IEEE Transactions on Computers
Algorithms for matrix transposition on Boolean N-cube configured ensemble architecture

SIAM Journal on Matrix Analysis and Applications
An architecture for optimal all-to-all personalized communication

SPAA '94 Proceedings of the sixth annual ACM symposium on Parallel algorithms and architectures
Optimal multiphase complete exchange on circuit-switched hypercube architectures

SIGMETRICS '94 Proceedings of the 1994 ACM SIGMETRICS conference on Measurement and modeling of computer systems
CCL: A Portable and Tunable Collective Communication Library for Scalable Parallel Computers

IEEE Transactions on Parallel and Distributed Systems
Using MPI: portable parallel programming with the message-passing interface

Using MPI: portable parallel programming with the message-passing interface
Adaptive routing protocols for hypercube interconnection networks

Computer
Message-Routing Systems for Transputer-Based Multicomputers

IEEE Micro
Multiphase Complete Exchange: A Theoretical Analysis

IEEE Transactions on Computers

An Analytical Method for Predicting the Performance of Parallel Image Processing Operations

The Journal of Supercomputing
A new method to make communication latency uniform: distributed routing balancing

ICS '99 Proceedings of the 13th international conference on Supercomputing
Configurable Algorithms for Complete Exchange in 2D Meshes

IEEE Transactions on Parallel and Distributed Systems
All-to-All Personalized Communication in Multidimensional Torus and Mesh Networks

IEEE Transactions on Parallel and Distributed Systems
Hybrid Algorithms for Complete Exchange in 2D Meshes

IEEE Transactions on Parallel and Distributed Systems
Achieving Robustness and Minimizing Overhead in Parallel Algorithms Through Overlapped Communication/Computation

The Journal of Supercomputing - Special issue on embedded fault-tolerance systems
Balancing Contention and Synchronization on the Intel Paragon

IEEE Parallel & Distributed Technology: Systems & Technology
Problems with Comparing Interconnection Networks: Is an Alligator Better Than an Armadillo?

IEEE Parallel & Distributed Technology: Systems & Technology
All-To-All Communication with Minimum Start-Up Costs in 2D/3D Tori and Meshes

IEEE Transactions on Parallel and Distributed Systems
Portable and scalable algorithm for irregular all-to-all communication

Journal of Parallel and Distributed Computing
Contention-Aware Communication Schedule for High-Speed Communication

Cluster Computing
Multiphase Data Exchange in Distributed Logic-Algebraic Based Processing

IEA/AIE '08 Proceedings of the 21st international conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems: New Frontiers in Applied Artificial Intelligence
Efficient shared memory and RDMA based collectives on multi-rail QsNetII SMP clusters

Cluster Computing
Efficient RDMA-based multi-port collectives on multi-rail QsNetII clusters

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

The overhead of interprocessor communication is a major factor in limiting the performance of parallel computer systems. The complete exchange is the severest communication pattern in that it requires each processor to send a distinct message to every other processor. This pattern is at the heart of many important parallel applications. There are three main algorithms for complete exchange, all designed for hypercubes: the direct exchange, the standard exchange, and the multiphase exchange. Most contemporary commercial multicomputer systems are not hypercubes. However, through special-purpose hardware and dedicated communication processors, these systems can achieve very high performance communication and can emulate hypercubes quite well. Multiphase complete exchange, which is actually a family of algorithms with standard and direct exchange as extreme cases, performs optimally for varying message sizes. The author has implemented multiphase complete exchange on three contemporary parallel architectures: the Intel Paragon, the IBM SP2, and the Meiko CS-2. He describes the essential features of these machines and discusses their basic interprocessor communication overheads. Then he evaluates the performance of multiphase complete exchange on each machine. He discovered that the Paragon executes the multiphase well and yields smooth performance plots, with the cyclic variations in these plots stemming from memory access patterns; the SP2 exhibits enormous fluctuations in its plots because of interference from other jobs; and the CS-2 exhibits small fluctuations and the largest differences between predicted and observed timings. The author concludes that the theoretical ideas developed for hypercubes also apply to these machines and that multiphase complete exchange can lead to major savings in execution time over traditional solutions.