Two-tree algorithms for full bandwidth broadcast, reduction and scan

  • Authors:
  • Peter Sanders;Jochen Speck;Jesper Larsson Träff

  • Affiliations:
  • Universität Karlsruhe, D-76128 Karlsruhe, Germany;Universität Karlsruhe, D-76128 Karlsruhe, Germany;NEC Laboratories Europe, NEC Europe Ltd., Rathausallee 10, D-53757 Sankt Augustin, Germany

  • Venue:
  • Parallel Computing
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present a new, simple algorithmic idea for the collective communication operations broadcast, reduction, and scan (prefix sums). The algorithms concurrently communicate over two binary trees which both span the entire network. By careful layout and communication scheduling, each tree communicates as efficiently as a single tree with exclusive use of the network. Our algorithms thus achieve up to twice the bandwidth of most previous algorithms. In particular, our approach beats all previous algorithms for reduction and scan. Experiments on clusters with Myrinet and InfiniBand interconnect show significant reductions in running time for all three operations sometimes even close to the best possible factor of two.