Verifying large-scale system performance during installation using modelling

  • Authors:
  • Darren J. Kerbyson;Adolfy Hoisie;Harvey J. Wasserman

  • Affiliations:
  • Los Alamos National Laboratory, Performance and Architectures Laboratory (PAL), CCS-3, P.O. Box 1663, Los Alamos, NM;Los Alamos National Laboratory, Performance and Architectures Laboratory (PAL), CCS-3, P.O. Box 1663, Los Alamos, NM;Los Alamos National Laboratory, Performance and Architectures Laboratory (PAL), CCS-3, P.O. Box 1663, Los Alamos, NM

  • Venue:
  • High performance scientific and engineering computing
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper we describe an important use of predictive application performance modelling - the validation of measured performance during a new large-scale system installation. Using a previously-developed and validated performance model for SAGE, a multidimensional, 3D, multi-material hydrodynamics code with adaptive mesh refinement, we were able to help guide the stabilization of the Los Alamos ASCI Q supercomputer. This system was installed in several stages and has a peak processing rate of 20-Teraflops. We review the salient features of an analytical model for SAGE that has been applied to predict its performance on a large class of Tera-scale parallel systems. We describe the methodology applied during system installation and upgrades to establish a baseline for the achievable "real" performance of the system. We also show the effect on overall application performance of certain key subsystems such as PCI bus speed and processor speed. We show that utilization of predictive performance models can be a powerful system debugging tool.