Entering the petaflop era: the architecture and performance of Roadrunner

  • Authors:
  • Kevin J. Barker;Kei Davis;Adolfy Hoisie;Darren J. Kerbyson;Mike Lang;Scott Pakin;Jose C. Sancho

  • Affiliations:
  • Los Alamos National Laboratory, Los Alamos;Los Alamos National Laboratory, Los Alamos;Los Alamos National Laboratory, Los Alamos;Los Alamos National Laboratory, Los Alamos;Los Alamos National Laboratory, Los Alamos;Los Alamos National Laboratory, Los Alamos;Los Alamos National Laboratory, Los Alamos

  • Venue:
  • Proceedings of the 2008 ACM/IEEE conference on Supercomputing
  • Year:
  • 2008

Quantified Score

Hi-index 0.01

Visualization

Abstract

Roadrunner is a 1.38 Pflop/s-peak (double precision) hybrid-architecture supercomputer developed by LANL and IBM. It contains 12,240 IBM PowerXCell 8i processors and 12,240 AMD Opteron cores in 3,060 compute nodes. Roadrunner is the first supercomputer to run Linpack at a sustained speed in excess of 1 Pflop/s. In this paper we present a detailed architectural description of Roadrunner and a detailed performance analysis of the system. A case study of optimizing the MPI-based application Sweep3D to exploit Roadrunner's hybrid architecture is also included. The performance of Sweep3D is compared to that of the code on a previous implementation of the Cell Broadband Engine architecture---the Cell BE---and on multi-core processors. Using validated performance models combined with Roadrunner-specific microbenchmarks we identify performance issues in the early pre-delivery system and infer how well the final Roadrunner configuration will perform once the system software stack has matured.