Optimal broadcast for fully connected networks

Authors:
Jesper Larsson Träff;Andreas Ripke
Affiliations:
C&C Research Laboratories, NEC Europe Ltd., Sankt Augustin, Germany;C&C Research Laboratories, NEC Europe Ltd., Sankt Augustin, Germany
Venue:
HPCC'05 Proceedings of the First international conference on High Performance Computing and Communications
Year:
2005

Citing 11
Cited 7

Optimum Broadcasting and Personalized Communication in Hypercubes

IEEE Transactions on Computers
Methods and problems of communication in usual networks

Proceedings of the international workshop on Broadcasting and gossiping 1990
LogP: a practical model of parallel computation

Communications of the ACM
LogGP: incorporating long messages into the LogP model for parallel computation

Journal of Parallel and Distributed Computing
Optimal and near-optimal algorithms for k-item broadcast

Journal of Parallel and Distributed Computing
MPI-The Complete Reference, Volume 1: The MPI Core

MPI-The Complete Reference, Volume 1: The MPI Core
Broadcasting on Incomplete Hypercubes

IEEE Transactions on Computers
HiHCoHP: Toward a Realistic Communication Model for Hierarchical HyperClusters of Heterogeneous Processors

IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
A bandwidth latency tradeoff for broadcast and reduction

Information Processing Letters
On optimizing collective communication

CLUSTER '04 Proceedings of the 2004 IEEE International Conference on Cluster Computing
An optimal broadcast algorithm adapted to SMP clusters

PVM/MPI'05 Proceedings of the 12th European PVM/MPI users' group conference on Recent Advances in Parallel Virtual Machine and Message Passing Interface

Collective communication on architectures that support simultaneous communication over multiple links

Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Techniques for pipelined broadcast on ethernet switched clusters

Journal of Parallel and Distributed Computing
Optimal broadcast for fully connected processor-node networks

Journal of Parallel and Distributed Computing
Collective operations in NEC's high-performance MPI libraries

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
An optimal broadcast algorithm adapted to SMP clusters

PVM/MPI'05 Proceedings of the 12th European PVM/MPI users' group conference on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Full bandwidth broadcast, reduction and scan with only two trees

PVM/MPI'07 Proceedings of the 14th European conference on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Process cooperation in multiple message broadcast

PVM/MPI'07 Proceedings of the 14th European conference on Recent Advances in Parallel Virtual Machine and Message Passing Interface

Quantified Score

Hi-index	0.00

Visualization

Abstract

We develop and implement a new optimal broadcast algorithm for fully connected, bidirectional, one-ported networks under a linear communication cost model. For any number of processors p the number of communication rounds required to broadcast N blocks of data is ⌈logp⌉−1+N. For data of size m, assuming that sending and receiving m data units takes time α+βm, the best running time that can be achieved is $(\sqrt{(\lceil{\rm log} p\rceil - 1)\alpha} + \sqrt{{\beta}m})^{2}$, meeting the lower bound under the assumption that the m units are sent as N blocks. This is better than previously known (and implemented) results, which achieve this only when p is a power of two (or other special cases), in particular, the algorithm is (theoretically) a factor two better than the commonly used, pipelined binary tree algorithm. The algorithm has a regular communication pattern based on simultaneous binomial-like trees, and when the number of blocks to be broadcast is one, degenerates into a binomial tree broadcast. Thus the same algorithm can be used for all message sizes m. The algorithm has been incorporated into a state-of-the-art MPI (Message Passing Interface) library. We demonstrate significant practical improvements of up to a factor 1.5 over several other, commonly used broadcast algorithms.