Efficient implementation of allreduce on bluegene/l collective network

  • Authors:
  • George Almási;Gábor Dózsa;C. Chris Erway;Burkhardt Steinmacher-Burow

  • Affiliations:
  • IBM T. J. Watson Research Center, Yorktown Heights, NY;IBM T. J. Watson Research Center, Yorktown Heights, NY;Dept. of Comp. Sci, Brown University Providence, RI;IBM Germany, Boeblingen, Germany

  • Venue:
  • PVM/MPI'05 Proceedings of the 12th European PVM/MPI users' group conference on Recent Advances in Parallel Virtual Machine and Message Passing Interface
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

BlueGene/L is currently in the pole position on the Top500 list[4]. In its full configuration the system will leverage 65,536 compute nodes. Application scalability is a crucial issue for a system of such size. On BlueGene/L scalability is made possible through the efficient exploitation of special communication. The BlueGene/L system software provides its own optimized version for collective communication routines in addition to the general purpose MPICH2 implementation. The collective network is a natural platform for reduction operations due to its built-in arithmetic units. Unfortunately ALUs of the collective network can handle only fixed point operands. Therefore efficient exploitation of that network for the purpose of floating point reductions is a challenging task. In this paper we present our experiences with implementing an efficient collective network algorithm for Allreduce sums of floating point numbers.