A Holistic Approach for Performance Measurement and Analysis for Petascale Applications

Authors:
Heike Jagode;Jack Dongarra;Sadaf Alam;Jeffrey Vetter;Wyatt Spear;Allen D. Malony
Affiliations:
The University of Tennessee, and Oak Ridge National Laboratory,;The University of Tennessee, and Oak Ridge National Laboratory,;Oak Ridge National Laboratory,;Oak Ridge National Laboratory,;University of Oregon,;University of Oregon,
Venue:
ICCS 2009 Proceedings of the 9th International Conference on Computational Science
Year:
2009

Citing 2
Cited 4

Low-storage, explicit Runge-Kutta schemes for the compressible Navier-Stokes equations

Applied Numerical Mathematics
Introducing the open trace format (OTF)

ICCS'06 Proceedings of the 6th international conference on Computational Science - Volume Part II

Trace-based performance analysis for the petascale simulation code FLASH

International Journal of High Performance Computing Applications
An approach to creating performance visualizations in a parallel profile analysis tool

Euro-Par'11 Proceedings of the 2011 international conference on Parallel Processing - Volume 2
Enabling event tracing at leadership-class scale through I/O forwarding middleware

Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
Optimizing I/O forwarding techniques for extreme-scale event tracing

Cluster Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Contemporary high-end Terascale and Petascale systems are composed of hundreds of thousands of commodity multi-core processors interconnected with high-speed custom networks. Performance characteristics of applications executing on these systems are a function of system hardware and software as well as workload parameters. Therefore, it has become increasingly challenging to measure, analyze and project performance using a single tool on these systems. In order to address these issues, we propose a methodology for performance measurement and analysis that is aware of applications and the underlying system hierarchies. On the application level, we measure cost distribution and runtime dependent values for different components of the underlying programming model. On the system front, we measure and analyze information gathered for unique system features, particularly shared components in the multi-core processors. We demonstrate our approach using a Petascale combustion application called S3D on two high-end Teraflops systems, Cray XT4 and IBM Blue Gene/P, using a combination of hardware performance monitoring, profiling and tracing tools.