Exploiting SIMD parallelism in DSP and multimedia algorithms using the AltiVec technology
ICS '99 Proceedings of the 13th international conference on Supercomputing
Introduction to algorithms
FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
LAPACK Working Note 65: Parallel Matrix Transpose Algorithms on Distributed Memory Concurrent Computers
The BlackWidow High-Radix Clos Network
Proceedings of the 33rd annual international symposium on Computer Architecture
Optimizing the HPCC randomaccess benchmark on blue Gene/L Supercomputer
SIGMETRICS '06/Performance '06 Proceedings of the joint international conference on Measurement and modeling of computer systems
The HPC Challenge (HPCC) benchmark suite
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
The Cray BlackWidow: a highly scalable vector multiprocessor
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Technology-Driven, Highly-Scalable Dragonfly Topology
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
The PERCS High-Performance Interconnect
HOTI '10 Proceedings of the 2010 18th IEEE Symposium on High Performance Interconnects
IBM POWER7 multicore server processor
IBM Journal of Research and Development
IBM Journal of Research and Development
PERCS: the IBM power7-IH high-performance computing system
IBM Journal of Research and Development
An early performance analysis of POWER7-IH HPC systems
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
GPUs and the Future of Parallel Computing
IEEE Micro
Composable, non-blocking collective operations on power7 IH
Proceedings of the 26th ACM international conference on Supercomputing
PAMI: A Parallel Active Message Interface for the Blue Gene/Q Supercomputer
IPDPS '12 Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium
Cray cascade: a scalable HPC system based on a Dragonfly network
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming
Hi-index | 0.00 |
We describe the IBM Power 775, a supercomputing system that was designed to provide high performance at very large scales. The system recently attained world record performance numbers for three important, communication-heavy supercomputing benchmarks: RandomAccess, PTRANS, and Global FFT (while the Power 775 currently holds the number two spot in Global FFT, its efficiency when computing the FFT exceeds that of the number one system's by over 3.5 times). At the heart of the Power 775's performance is the "hub module", which is a high-radix router containing forty-seven copper and optical links with a switching capacity of over 1.1 Tbyte/second. This level of bandwidth is unprecedented for typical systems of the scale we discuss in this paper. As a result, we were forced to develop a complete software stack to fully leverage the communication capabilities of the system. In this paper we evaluate the Power 775 server at scales up to 2 Petaflops (63,360 POWER7 cores), discuss hardware and software tradeoffs considered during the design process, and finally present some lessons learned.