Trace-based performance analysis for the petascale simulation code FLASH

Authors:
Heike Jagode;Andreas Knüpfer;Jack Dongarra;Matthias Jurenz;Matthias S Müller;Wolfgang E Nagel
Affiliations:
The University of Tennessee, USA;Technische Universität Dresden, Germany;The University of Tennessee, USA;Technische Universität Dresden, Germany;Technische Universität Dresden, Germany;Technische Universität Dresden, Germany
Venue:
International Journal of High Performance Computing Applications
Year:
2011

Citing 12
Cited 0

Mache: no-loss trace compaction

SIGMETRICS '89 Proceedings of the 1989 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Exploiting Lustre File Joining for Effective Collective IO

CCGRID '07 Proceedings of the Seventh IEEE International Symposium on Cluster Computing and the Grid
Leveraging non-blocking collective communication in high-performance applications

Proceedings of the twentieth annual symposium on Parallelism in algorithms and architectures
ScalaTrace: Scalable compression and replay of communication traces for high-performance computing

Journal of Parallel and Distributed Computing
A Holistic Approach for Performance Measurement and Analysis for Petascale Applications

ICCS 2009 Proceedings of the 9th International Conference on Computational Science
Space-efficient time-series call-path profiling of parallel applications

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Evaluating similarity-based trace reduction techniques for scalable performance analysis

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Compressible memory data structures for event-based trace analysis

Future Generation Computer Systems
Capturing and visualizing event flow graphs of MPI applications

Euro-Par'09 Proceedings of the 2009 international conference on Parallel processing
Introducing the open trace format (OTF)

ICCS'06 Proceedings of the 6th international conference on Computational Science - Volume Part II
Automatic structure extraction from MPI applications tracefiles

Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing
A case for standard non-blocking collective operations

PVM/MPI'07 Proceedings of the 14th European conference on Recent Advances in Parallel Virtual Machine and Message Passing Interface

Quantified Score

Hi-index	0.00

Visualization

Abstract

Performance analysis of applications on modern high-end petascale systems is increasingly challenging due to the rising complexity and quantity of the computing units. This paper presents a performance-analysis study using the Vampir performance-analysis tool suite, which examines application behavior as well as the fundamental system properties. This study was carried out on the Jaguar system at Oak Ridge National Laboratory, the fastest computer on the November 2009 Top500 list. We analyzed the FLASH simulation code that is designed to be scaled with tens of thousands of CPU cores, which means that using existing performance-analysis tools is very complex. The study reveals two classes of performance problems that are relevant for very high CPU counts: MPI communication and scalable I/O. For both, solutions are presented and verified. Finally, the paper proposes improvements and extensions for event tracing tools in order to allow scalability of the tools towards higher degrees of parallelism.