Composable, non-blocking collective operations on power7 IH

  • Authors:
  • Gabriel Ilie Tanase;Gheorghe Almási;Hanhong Xue;Charles Archer

  • Affiliations:
  • IBM TJ Watson Research Center, Yorktown Heights, NY, USA;IBM TJ Watson Research Center, Yorktown Heights, NY, USA;IBM Systems and Technology Group, Poughkeepsie, NY, USA;IBM Systems and Technology Group, Rochester, MN, USA

  • Venue:
  • Proceedings of the 26th ACM international conference on Supercomputing
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

The Power7 IH (P7IH) is one of IBM's latest generation of supercomputers. Like most modern parallel machines, it has a hierarchical organization consisting of simultaneous multithreading (SMT) within a core, multiple cores per processor, multiple processors per node (SMP), and multiple SMPs per cluster. A low latency/high bandwidth network with specialized accelerators is used to interconnect the SMP nodes. System software is tuned to exploit the hierarchical organization of the machine. In this paper we present a novel set of collective operations that take advantage of the P7IH hardware. We discuss non blocking collective operations implemented using point to point messages, shared memory and accelerator hardware. We show how collectives can be composed to exploit the hierarchical organization of the P7IH for providing low latency, high bandwidth operations. We demonstrate the scalability of the collectives we designed by including experimental results on a P7IH system with up to 4096 cores.