Understanding Application Performance via Micro-benchmarks on Three Large Supercomputers: Intrepid, Ranger and Jaguar

Authors:
Abhinav Bhatelé;Lukasz Wesolowski;Eric Bohm;Edgar Solomonik;Laxmikant V. Kalé
Affiliations:
Department of Computer Science, University of Illinoisat Urbana-Champaign, Urbana, IL 61801, USA;Department of Computer Science, University of Illinoisat Urbana-Champaign, Urbana, IL 61801, USA;Department of Computer Science, University of Illinoisat Urbana-Champaign, Urbana, IL 61801, USA;Department of Computer Science, University of Illinoisat Urbana-Champaign, Urbana, IL 61801, USA;Department of Computer Science, University of Illinoisat Urbana-Champaign, Urbana, IL 61801, USA
Venue:
International Journal of High Performance Computing Applications
Year:
2010

Citing 9
Cited 4

CHARM++: a portable concurrent object oriented system based on C++

OOPSLA '93 Proceedings of the eighth annual conference on Object-oriented programming systems, languages, and applications
A performance comparison through benchmarking and modeling of three leading supercomputers: blue Gene/L, Red Storm, and Purple

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Cray XT4: an early evaluation for petascale scientific simulation

Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Overview of the IBM Blue Gene/P project

IBM Journal of Research and Development
Early evaluation of IBM BlueGene/P

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Architecture of the Component Collective Messaging Interface

Proceedings of the 15th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Roofline: an insightful visual performance model for multicore architectures

Communications of the ACM - A Direct Path to Dependable Software
IBM System Blue Gene Solution: Blue Gene/P Application Development

IBM System Blue Gene Solution: Blue Gene/P Application Development
An evaluative study on the effect of contention on message latencies in large supercomputers

IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing

Enabling and scaling biomolecular simulations of 100 million atoms on petascale machines with a multicore-optimized message-driven runtime

Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Adapting MPI to MapReduce PaaS Clouds: An Experiment in Cross-Paradigm Execution

UCC '12 Proceedings of the 2012 IEEE/ACM Fifth International Conference on Utility and Cloud Computing
Unified performance and power modeling of scientific workloads

E2SC '13 Proceedings of the 1st International Workshop on Energy Efficient Supercomputing
A performance comparison of current HPC systems: Blue Gene/Q, Cray XE6 and InfiniBand systems

Future Generation Computer Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The emergence of new parallel architectures presents new challenges for application developers. Supercomputers vary in processor speed, network topology, interconnect communication characteristics and memory subsystems. This paper presents a performance comparison of three of the fastest machines in the world: IBMâ聙聶s Blue Gene/P installation at ANL (Intrepid), the SUN-Infiniband cluster at TACC (Ranger) and Crayâ聙聶s XT4 installation at ORNL (Jaguar). Comparisons are based on three applications selected by NSF for the Track 1 proposal to benchmark the Blue Waters system: NAMD, MILC and a turbulence code, DNS. We present a comprehensive overview of the architectural details of each of these machines and a comparison of their basic performance parameters. Application performance is presented for multiple problem sizes and the relative performance on the selected machines is explained through micro-benchmarking results. We hope that insights from this work will be useful to managers making buying decisions for supercomputers and application users trying to decide on a machine to run on. Based on the performance analysis techniques used in the paper, we also suggest a step-by-step procedure for estimating the suitability of a given architecture for a highly parallel application.