PerfExpert: An Easy-to-Use Performance Diagnosis Tool for HPC Applications

Authors:
Martin Burtscher;Byoung-Do Kim;Jeff Diamond;John McCalpin;Lars Koesterke;James Browne
Affiliations:
-;-;-;-;-;-
Venue:
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Year:
2010

Citing 15
Cited 14

ATExpert

Journal of Parallel and Distributed Computing - Special issue on tools and methods for visualization of parallel systems and computations
Parallel performance prediction using lost cycles analysis

Proceedings of the 1994 ACM/IEEE conference on Supercomputing
The Paradyn Parallel Performance Measurement Tool

Computer
Active harmony: towards automated performance tuning

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Code Generation in the Polyhedral Model Is Easier Than You Think

Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Automated Cluster-Based Web Service Performance Tuning

HPDC '04 Proceedings of the 13th IEEE International Symposium on High Performance Distributed Computing
An API for Runtime Code Patching

International Journal of High Performance Computing Applications
The NCAR Spectral Element Climate Dynamical Core: Semi-Implicit Eulerian Formulation

Journal of Scientific Computing
The Tau Parallel Performance System

International Journal of High Performance Computing Applications
Performance and environment monitoring for continuous program optimization

IBM Journal of Research and Development
libMesh: a C++ library for parallel adaptive mesh refinement/coarsening simulations

Engineering with Computers
A Productivity Centered Tools Framework for Application Performance Tuning

QEST '07 Proceedings of the Fourth International Conference on Quantitative Evaluation of Systems
Scalable adaptive mantle convection simulation on petascale supercomputers

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
A Holistic Approach towards Automated Performance Analysis and Tuning

Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
Practical differential profiling

Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing

Poster: determining code segments that can benefit from execution on GPUs

Proceedings of the 2011 companion on High Performance Computing Networking, Storage and Analysis Companion
ADP: automated diagnosis of performance pathologies using hardware events

Proceedings of the 12th ACM SIGMETRICS/PERFORMANCE joint international conference on Measurement and Modeling of Computer Systems
A systematic process for efficient execution on Intel's heterogeneous computation nodes

Proceedings of the 1st Conference of the Extreme Science and Engineering Discovery Environment: Bridging from the eXtreme to the campus and beyond
Enhancing performance optimization of multicore chips and multichip nodes with data structure metrics

Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Towards an energy-aware scientific I/O interface

Computer Science - Research and Development
Refactoring and automated performance tuning of computational chemistry application codes

Proceedings of the Winter Simulation Conference
Performance patterns and hardware metrics on modern multicore processors: best practices for performance engineering

Euro-Par'12 Proceedings of the 18th international conference on Parallel processing workshops
Quantifying performance bottleneck cost through differential analysis

Proceedings of the 27th international ACM conference on International conference on supercomputing
Scalasca support for MPI+OpenMP parallel applications on large-scale HPC systems based on Intel Xeon Phi

Proceedings of the Conference on Extreme Science and Engineering Discovery Environment: Gateway to Discovery
Comprehensive job level resource usage measurement and analysis for XSEDE HPC systems

Proceedings of the Conference on Extreme Science and Engineering Discovery Environment: Gateway to Discovery
National Center for Genome Analysis support leverages XSEDE to support life science research

Proceedings of the Conference on Extreme Science and Engineering Discovery Environment: Gateway to Discovery
Enabling comprehensive data-driven system management for large computational facilities

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Detection of false sharing using machine learning

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Framework for a productive performance optimization

Parallel Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

HPC systems are notorious for operating at a small fraction of their peak performance, and the ongoing migration to multi-core and multi-socket compute nodes further complicates performance optimization. The readily available performance evaluation tools require considerable effort to learn and utilize. Hence, most HPC application writers do not use them. As remedy, we have developed PerfExpert, a tool that combines a simple user interface with a sophisticated analysis engine to detect probable core, socket, and node-level performance bottlenecks in each important procedure and loop of an application. For each bottle-neck, PerfExpert provides a concise performance assessment and suggests steps that can be taken by the programmer to improve performance. These steps include compiler switches and optimization strategies with code examples. We have applied PerfExpert to several HPC production codes on the Ranger supercomputer. In all cases, it correctly identified the critical code sections and provided accurate assessments of their performance.