A performance comparison through benchmarking and modeling of three leading supercomputers: blue Gene/L, Red Storm, and Purple

Authors:
Adolfy Hoisie;Greg Johnson;Darren J. Kerbyson;Michael Lang;Scott Pakin
Affiliations:
Performance and Architecture Lab (PAL), Los Alamos National Laboratory;Performance and Architecture Lab (PAL), Los Alamos National Laboratory;Performance and Architecture Lab (PAL), Los Alamos National Laboratory;Performance and Architecture Lab (PAL), Los Alamos National Laboratory;Performance and Architecture Lab (PAL), Los Alamos National Laboratory
Venue:
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Year:
2006

Citing 7
Cited 19

Predictive performance and scalability modeling of a large-scale application

Proceedings of the 2001 ACM/IEEE conference on Supercomputing
An overview of the BlueGene/L Supercomputer

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
A Performance and Scalability Analysis of the BlueGene/L Architecture

Proceedings of the 2004 ACM/IEEE conference on Supercomputing
The Case of the Missing Supercomputer Performance: Achieving Optimal Performance on the 8,192 Processors of ASCI Q

Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Performance and Scalability Analysis of Teraflop-Scale Parallel Architectures Using Multidimensional Wavefront Applications

International Journal of High Performance Computing Applications
Large-Scale First-Principles Molecular Dynamics simulations on the BlueGene/L Platform using the Qbox code

SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
SeaStar Interconnect: Balanced Bandwidth for Scalable Performance

IEEE Micro

The context-switch overhead inflicted by hardware interrupts (and the enigma of do-nothing loops)

Proceedings of the 2007 workshop on Experimental computer science
The context-switch overhead inflicted by hardware interrupts (and the enigma of do-nothing loops)

ecs'07 Experimental computer science on Experimental computer science
Future generation supercomputers II: a paradigm for cluster architecture

ACM SIGARCH Computer Architecture News - Special issue: ALPS '07---advanced low power systems
Scientific application-based performance comparison of SGI Altix 4700, IBM POWER5+, and SGI ICE 8200 supercomputers

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Computational forces in the SAGE benchmark

Journal of Parallel and Distributed Computing
WARPP: a toolkit for simulating high-performance parallel scientific codes

Proceedings of the 2nd International Conference on Simulation Tools and Techniques
Instruction-level simulation of a cluster at scale

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Diagnosing performance bottlenecks in emerging petascale applications

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Exploiting 162-Nanosecond End-to-End Communication Latency on Anton

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
jitSim: a simulator for predicting scalability of parallel applications in presence of OS jitter

EuroPar'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part I
Understanding Application Performance via Micro-benchmarks on Three Large Supercomputers: Intrepid, Ranger and Jaguar

International Journal of High Performance Computing Applications
Self-similarity of parallel machines

Parallel Computing
Predictive analysis of a hydrodynamics application on large-scale CMP clusters

Computer Science - Research and Development
The impact of injection bandwidth performance on application scalability

EuroMPI'11 Proceedings of the 18th European MPI Users' Group conference on Recent advances in the message passing interface
An early performance analysis of POWER7-IH HPC systems

Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Performance analysis of an optical circuit switched network for peta-scale systems

Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing
Application-driven analysis of two generations of capability computing: the transition to multicore processors

Concurrency and Computation: Practice & Experience
Unified performance and power modeling of scientific workloads

E2SC '13 Proceedings of the 1st International Workshop on Energy Efficient Supercomputing
A performance comparison of current HPC systems: Blue Gene/Q, Cray XE6 and InfiniBand systems

Future Generation Computer Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

This work provides a performance analysis of three leading supercomputers that have recently been deployed: Purple, Red Storm and Blue Gene/L. Each of these machines are architecturally diverse, with very different performance characteristics. Each contains over 10,000 processors and has a system peak of over 40 Teraflops. We analyze each system using a range of micro-benchmarks which include communication performance as well as quantifying the impact of the operating system. The achievable application performance is compared across the systems. The application performance is confirmed via the use of detailed application models which use the underlying performance characteristics as measured by the micro-benchmarks. We also compare the machines in a realistic production scenario in which each machine is used so as to maximize its memory usage with the applications executed in a weak-scaling mode. The results also help illustrate that achievable performance is not directly related to the peak performance.