Optimal broadcast for fully connected networks

  • Authors:
  • Jesper Larsson Träff;Andreas Ripke

  • Affiliations:
  • C&C Research Laboratories, NEC Europe Ltd., Sankt Augustin, Germany;C&C Research Laboratories, NEC Europe Ltd., Sankt Augustin, Germany

  • Venue:
  • HPCC'05 Proceedings of the First international conference on High Performance Computing and Communications
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

We develop and implement a new optimal broadcast algorithm for fully connected, bidirectional, one-ported networks under a linear communication cost model. For any number of processors p the number of communication rounds required to broadcast N blocks of data is ⌈logp⌉−1+N. For data of size m, assuming that sending and receiving m data units takes time α+βm, the best running time that can be achieved is $(\sqrt{(\lceil{\rm log} p\rceil - 1)\alpha} + \sqrt{{\beta}m})^{2}$, meeting the lower bound under the assumption that the m units are sent as N blocks. This is better than previously known (and implemented) results, which achieve this only when p is a power of two (or other special cases), in particular, the algorithm is (theoretically) a factor two better than the commonly used, pipelined binary tree algorithm. The algorithm has a regular communication pattern based on simultaneous binomial-like trees, and when the number of blocks to be broadcast is one, degenerates into a binomial tree broadcast. Thus the same algorithm can be used for all message sizes m. The algorithm has been incorporated into a state-of-the-art MPI (Message Passing Interface) library. We demonstrate significant practical improvements of up to a factor 1.5 over several other, commonly used broadcast algorithms.