An optimal broadcast algorithm adapted to SMP clusters

Authors:
Jesper Larsson Träff;Andreas Ripke
Affiliations:
C&C Research Laboratories, NEC Europe Ltd., Sankt Augustin, Germany;C&C Research Laboratories, NEC Europe Ltd., Sankt Augustin, Germany
Venue:
PVM/MPI'05 Proceedings of the 12th European PVM/MPI users' group conference on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Year:
2005

Citing 7
Cited 5

Optimum Broadcasting and Personalized Communication in Hypercubes

IEEE Transactions on Computers
Efficient Algorithms for All-to-All Communications in Multiport Message-Passing Systems

IEEE Transactions on Parallel and Distributed Systems
Optimal and near-optimal algorithms for k-item broadcast

Journal of Parallel and Distributed Computing
MPI-The Complete Reference, Volume 1: The MPI Core

MPI-The Complete Reference, Volume 1: The MPI Core
A bandwidth latency tradeoff for broadcast and reduction

Information Processing Letters
On optimizing collective communication

CLUSTER '04 Proceedings of the 2004 IEEE International Conference on Cluster Computing
Optimal broadcast for fully connected networks

HPCC'05 Proceedings of the First international conference on High Performance Computing and Communications

Collective communication on architectures that support simultaneous communication over multiple links

Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Techniques for pipelined broadcast on ethernet switched clusters

Journal of Parallel and Distributed Computing
Collective operations in NEC's high-performance MPI libraries

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Optimal broadcast for fully connected networks

HPCC'05 Proceedings of the First international conference on High Performance Computing and Communications
Full bandwidth broadcast, reduction and scan with only two trees

PVM/MPI'07 Proceedings of the 14th European conference on Recent Advances in Parallel Virtual Machine and Message Passing Interface

Quantified Score

Hi-index	0.00

Visualization

Abstract

We describe and and evaluate the adaption of a new, optimal broadcast algorithm for “flat”, fully connected networks to clusters of SMP nodes. The optimal broadcast algorithm improves over other commonly used broadcast algorithms (pipelined binary trees, recursive halving) by up to a factor of two for the non-hierarchical (non-SMP) case. The algorithm is well suited for clusters of SMP nodes, since intra-node broadcast of relatively small blocks can take place concurrently with inter-node communication over the network. This new algorithm has been incorporated into a state-of-the art MPI library. On a 32-node dual-processor AMD cluster with Myrinet interconnect, improvements of a factor of 1.5 over for instance a pipelined binary tree algorithm has been achieved, both for the case with one and with two MPI processes per node.