The power 775 architecture at scale

Authors:
Ramakrishnan Rajamony;Mark W. Stephenson;William Evan Speight
Affiliations:
IBM Research, Austin, TX, USA;IBM Research, Austin, TX, USA;IBM Research, Austin, TX, USA
Venue:
Proceedings of the 27th international ACM conference on International conference on supercomputing
Year:
2013

Citing 19
Cited 1

Exploiting SIMD parallelism in DSP and multimedia algorithms using the AltiVec technology

ICS '99 Proceedings of the 13th international conference on Supercomputing
Introduction to algorithms

Introduction to algorithms
Cache-Oblivious Algorithms

FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
LAPACK Working Note 65: Parallel Matrix Transpose Algorithms on Distributed Memory Concurrent Computers

LAPACK Working Note 65: Parallel Matrix Transpose Algorithms on Distributed Memory Concurrent Computers
The BlackWidow High-Radix Clos Network

Proceedings of the 33rd annual international symposium on Computer Architecture
Optimizing the HPCC randomaccess benchmark on blue Gene/L Supercomputer

SIGMETRICS '06/Performance '06 Proceedings of the joint international conference on Measurement and modeling of computer systems
The HPC Challenge (HPCC) benchmark suite

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
The Cray BlackWidow: a highly scalable vector multiprocessor

Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Technology-Driven, Highly-Scalable Dragonfly Topology

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Power7: IBM's Next-Generation Server Processor

IEEE Micro
The PERCS High-Performance Interconnect

HOTI '10 Proceedings of the 2010 18th IEEE Symposium on High Performance Interconnects
IBM POWER7 multicore server processor

IBM Journal of Research and Development
IBM POWER7 systems

IBM Journal of Research and Development
PERCS: the IBM power7-IH high-performance computing system

IBM Journal of Research and Development
An early performance analysis of POWER7-IH HPC systems

Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
GPUs and the Future of Parallel Computing

IEEE Micro
Composable, non-blocking collective operations on power7 IH

Proceedings of the 26th ACM international conference on Supercomputing
PAMI: A Parallel Active Message Interface for the Blue Gene/Q Supercomputer

IPDPS '12 Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium
Cray cascade: a scalable HPC system based on a Dragonfly network

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis

X10 and APGAS at Petascale

Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming

Quantified Score

Hi-index	0.00

Visualization

Abstract

We describe the IBM Power 775, a supercomputing system that was designed to provide high performance at very large scales. The system recently attained world record performance numbers for three important, communication-heavy supercomputing benchmarks: RandomAccess, PTRANS, and Global FFT (while the Power 775 currently holds the number two spot in Global FFT, its efficiency when computing the FFT exceeds that of the number one system's by over 3.5 times). At the heart of the Power 775's performance is the "hub module", which is a high-radix router containing forty-seven copper and optical links with a switching capacity of over 1.1 Tbyte/second. This level of bandwidth is unprecedented for typical systems of the scale we discuss in this paper. As a result, we were forced to develop a complete software stack to fully leverage the communication capabilities of the system. In this paper we evaluate the Power 775 server at scales up to 2 Petaflops (63,360 POWER7 cores), discuss hardware and software tradeoffs considered during the design process, and finally present some lessons learned.