Optimal broadcast for fully connected processor-node networks

  • Authors:
  • Jesper Larsson Träff;Andreas Ripke

  • Affiliations:
  • NEC Laboratories Europe, NEC Europe Ltd., Rathausallee 10, D-53757 Sankt Augustin, Germany;NEC Laboratories Europe, NEC Europe Ltd., Rathausallee 10, D-53757 Sankt Augustin, Germany

  • Venue:
  • Journal of Parallel and Distributed Computing
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

We develop and implement an optimal broadcast algorithm for fully connected processor networks under a bidirectional communication model in which each processor can simultaneously send a message to one processor and receive a message from another, possibly different processor. For any number of processors p the algorithm requires N-1+@?logp@? communication rounds to broadcast N blocks of data from a root processor to the remaining processors, meeting the lower bound in the model. For data of size m, assuming that sending and receiving data of size m^' takes time @a+@bm^', the best running time that can be achieved by the division of m into equal-sized blocks is ((@?logp@?-1)@a+@bm)^2. The algorithm uses a regular, circulant graph communication pattern, and degenerates into a binomial tree broadcast when the number of blocks to be broadcast is one. The algorithm is furthermore well suited to fully connected clusters of SMP (Symmetric Multi-Processor) nodes. The algorithm is implemented as part of an MPI (Message Passing Interface) library. We demonstrate significant practical bandwidth improvements of up to a factor 1.5 over several other, commonly used broadcast algorithms on both a small SMP cluster and a 72 node NEC SX vector supercomputer.