Active messages: a mechanism for integrated communication and computation
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Efficient Algorithms for All-to-All Communications in Multiport Message-Passing Systems
IEEE Transactions on Parallel and Distributed Systems
Fast Collective Operations Using Shared and Remote Memory Access Protocols on Clusters
IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
Exploiting Hierarchy in Parallel Computer Networks to Optimize Collective Operation Performance
IPDPS '00 Proceedings of the 14th International Symposium on Parallel and Distributed Processing
MPI: A Message-Passing Interface
MPI: A Message-Passing Interface
Optimization of MPI collective communication on BlueGene/L systems
Proceedings of the 19th annual international conference on Supercomputing
X10: an object-oriented approach to non-uniform cluster computing
OOPSLA '05 Proceedings of the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Implementation and performance analysis of non-blocking collective operations for MPI
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Proceedings of the 22nd annual international conference on Supercomputing
HOTI '10 Proceedings of the 2010 18th IEEE Symposium on High Performance Interconnects
The STAPL parallel container framework
Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
IBM POWER7 multicore server processor
IBM Journal of Research and Development
IBM Journal of Research and Development
PERCS: the IBM power7-IH high-performance computing system
IBM Journal of Research and Development
Hybrid PGAS runtime support for multicore nodes
Proceedings of the Fourth Conference on Partitioned Global Address Space Programming Model
Analyzing the Performance Bottlenecks of the POWER7-IH Network
CLUSTER '11 Proceedings of the 2011 IEEE International Conference on Cluster Computing
The power 775 architecture at scale
Proceedings of the 27th international ACM conference on International conference on supercomputing
Optimization of MPI_Allreduce on the blue Gene/Q supercomputer
Proceedings of the 20th European MPI Users' Group Meeting
Hi-index | 0.00 |
The Power7 IH (P7IH) is one of IBM's latest generation of supercomputers. Like most modern parallel machines, it has a hierarchical organization consisting of simultaneous multithreading (SMT) within a core, multiple cores per processor, multiple processors per node (SMP), and multiple SMPs per cluster. A low latency/high bandwidth network with specialized accelerators is used to interconnect the SMP nodes. System software is tuned to exploit the hierarchical organization of the machine. In this paper we present a novel set of collective operations that take advantage of the P7IH hardware. We discuss non blocking collective operations implemented using point to point messages, shared memory and accelerator hardware. We show how collectives can be composed to exploit the hierarchical organization of the P7IH for providing low latency, high bandwidth operations. We demonstrate the scalability of the collectives we designed by including experimental results on a P7IH system with up to 4096 cores.