Composable, non-blocking collective operations on power7 IH

Authors:
Gabriel Ilie Tanase;Gheorghe Almási;Hanhong Xue;Charles Archer
Affiliations:
IBM TJ Watson Research Center, Yorktown Heights, NY, USA;IBM TJ Watson Research Center, Yorktown Heights, NY, USA;IBM Systems and Technology Group, Poughkeepsie, NY, USA;IBM Systems and Technology Group, Rochester, MN, USA
Venue:
Proceedings of the 26th ACM international conference on Supercomputing
Year:
2012

Citing 16
Cited 2

Active messages: a mechanism for integrated communication and computation

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Efficient Algorithms for All-to-All Communications in Multiport Message-Passing Systems

IEEE Transactions on Parallel and Distributed Systems
Fast Collective Operations Using Shared and Remote Memory Access Protocols on Clusters

IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
Exploiting Hierarchy in Parallel Computer Networks to Optimize Collective Operation Performance

IPDPS '00 Proceedings of the 14th International Symposium on Parallel and Distributed Processing
MPI: A Message-Passing Interface

MPI: A Message-Passing Interface
Optimization of MPI collective communication on BlueGene/L systems

Proceedings of the 19th annual international conference on Supercomputing
X10: an object-oriented approach to non-uniform cluster computing

OOPSLA '05 Proceedings of the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Implementation and performance analysis of non-blocking collective operations for MPI

Proceedings of the 2007 ACM/IEEE conference on Supercomputing
The deep computing messaging framework: generalized scalable message passing on the blue gene/P supercomputer

Proceedings of the 22nd annual international conference on Supercomputing
Design and Evaluation of Generalized Collective Communication Primitives with Overlap Using ConnectX-2 Offload Engine

HOTI '10 Proceedings of the 2010 18th IEEE Symposium on High Performance Interconnects
The STAPL parallel container framework

Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
IBM POWER7 multicore server processor

IBM Journal of Research and Development
IBM POWER7 systems

IBM Journal of Research and Development
PERCS: the IBM power7-IH high-performance computing system

IBM Journal of Research and Development
Hybrid PGAS runtime support for multicore nodes

Proceedings of the Fourth Conference on Partitioned Global Address Space Programming Model
Analyzing the Performance Bottlenecks of the POWER7-IH Network

CLUSTER '11 Proceedings of the 2011 IEEE International Conference on Cluster Computing

The power 775 architecture at scale

Proceedings of the 27th international ACM conference on International conference on supercomputing
Optimization of MPI_Allreduce on the blue Gene/Q supercomputer

Proceedings of the 20th European MPI Users' Group Meeting

Quantified Score

Hi-index	0.00

Visualization

Abstract

The Power7 IH (P7IH) is one of IBM's latest generation of supercomputers. Like most modern parallel machines, it has a hierarchical organization consisting of simultaneous multithreading (SMT) within a core, multiple cores per processor, multiple processors per node (SMP), and multiple SMPs per cluster. A low latency/high bandwidth network with specialized accelerators is used to interconnect the SMP nodes. System software is tuned to exploit the hierarchical organization of the machine. In this paper we present a novel set of collective operations that take advantage of the P7IH hardware. We discuss non blocking collective operations implemented using point to point messages, shared memory and accelerator hardware. We show how collectives can be composed to exploit the hierarchical organization of the P7IH for providing low latency, high bandwidth operations. We demonstrate the scalability of the collectives we designed by including experimental results on a P7IH system with up to 4096 cores.