Understanding Application Performance via Micro-benchmarks on Three Large Supercomputers: Intrepid, Ranger and Jaguar

  • Authors:
  • Abhinav Bhatelé;Lukasz Wesolowski;Eric Bohm;Edgar Solomonik;Laxmikant V. Kalé

  • Affiliations:
  • Department of Computer Science, University of Illinoisat Urbana-Champaign, Urbana, IL 61801, USA;Department of Computer Science, University of Illinoisat Urbana-Champaign, Urbana, IL 61801, USA;Department of Computer Science, University of Illinoisat Urbana-Champaign, Urbana, IL 61801, USA;Department of Computer Science, University of Illinoisat Urbana-Champaign, Urbana, IL 61801, USA;Department of Computer Science, University of Illinoisat Urbana-Champaign, Urbana, IL 61801, USA

  • Venue:
  • International Journal of High Performance Computing Applications
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

The emergence of new parallel architectures presents new challenges for application developers. Supercomputers vary in processor speed, network topology, interconnect communication characteristics and memory subsystems. This paper presents a performance comparison of three of the fastest machines in the world: IBMâ聙聶s Blue Gene/P installation at ANL (Intrepid), the SUN-Infiniband cluster at TACC (Ranger) and Crayâ聙聶s XT4 installation at ORNL (Jaguar). Comparisons are based on three applications selected by NSF for the Track 1 proposal to benchmark the Blue Waters system: NAMD, MILC and a turbulence code, DNS. We present a comprehensive overview of the architectural details of each of these machines and a comparison of their basic performance parameters. Application performance is presented for multiple problem sizes and the relative performance on the selected machines is explained through micro-benchmarking results. We hope that insights from this work will be useful to managers making buying decisions for supercomputers and application users trying to decide on a machine to run on. Based on the performance analysis techniques used in the paper, we also suggest a step-by-step procedure for estimating the suitability of a given architecture for a highly parallel application.