Knowledge support and automation for performance analysis with PerfExplorer 2.0

Authors:
Kevin A. Huck;Allen D. Malony;Sameer Shende;Alan Morris
Affiliations:
Corresponding author: Kevin A. Huck, Performance Research Laboratory, Computer and Information Science Department, University of Oregon, Eugene, OR 97403, USA. Tel.: +1 (541) 346 4409/ Fax: +1 (54 ...;-;-;Performance Research Laboratory, Computer and Information Science Department, University of Oregon, Eugene, OR 97403, USA
Venue:
Scientific Programming - Large-Scale Programming Tools and Environments
Year:
2008

Citing 17
Cited 4

SIEVE: a performance debugging environment for parallel programs

Journal of Parallel and Distributed Computing - Special issue on tools and methods for visualization of parallel systems and computations
Modeling and detecting performance problems for distributed and parallel programs with JavaPSL

Proceedings of the 2001 ACM/IEEE conference on Supercomputing
HPCVIEW: A Tool for Top-down Analysis of Node Performance

The Journal of Supercomputing
The Paradyn Parallel Performance Measurement Tool

Computer
Scalable analysis techniques for microprocessor performance counter metrics

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Asserting performance expectations

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Prophesy: an infrastructure for performance analysis and modeling of parallel and grid applications

ACM SIGMETRICS Performance Evaluation Review
An Algebra for Cross-Experiment Performance Analysis

ICPP '04 Proceedings of the 2004 International Conference on Parallel Processing
A framework for multi-execution performance tuning

On-line monitoring systems and computer tool interoperability
Design and Implementation of a Parallel Performance Data Management Framework

ICPP '05 Proceedings of the 2005 International Conference on Parallel Processing
PerfExplorer: A Performance Data Mining Framework For Large-Scale Parallel Computing

SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
The Tau Parallel Performance System

International Journal of High Performance Computing Applications
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Knowledge engineering for automatic parallel performance diagnosis: Research Articles

Concurrency and Computation: Practice & Experience - European–American Working Group on Automatic Performance Analysis (APART)
Scalable parallel trace-based performance analysis

EuroPVM/MPI'06 Proceedings of the 13th European PVM/MPI User's Group conference on Recent advances in parallel virtual machine and message passing interface
Model-Based relative performance diagnosis of wavefront parallel computations

HPCC'06 Proceedings of the Second international conference on High Performance Computing and Communications
Model-based performance diagnosis of master-worker parallel computations

Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing

Parametric Studies in Eclipse with TAU and PerfExplorer

Euro-Par 2008 Workshops - Parallel Processing
Automatic performance debugging of SPMD-style parallel programs

Journal of Parallel and Distributed Computing
Comprehensive job level resource usage measurement and analysis for XSEDE HPC systems

Proceedings of the Conference on Extreme Science and Engineering Discovery Environment: Gateway to Discovery
Enabling comprehensive data-driven system management for large computational facilities

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

The integration of scalable performance analysis in parallel development tools is difficult. The potential size of data sets and the need to compare results from multiple experiments presents a challenge to manage and process the information. Simply to characterize the performance of parallel applications running on potentially hundreds of thousands of processor cores requires new scalable analysis techniques. Furthermore, many exploratory analysis processes are repeatable and could be automated, but are now implemented as manual procedures. In this paper, we will discuss the current version of PerfExplorer, a performance analysis framework which provides dimension reduction, clustering and correlation analysis of individual trails of large dimensions, and can perform relative performance analysis between multiple application executions. PerfExplorer analysis processes can be captured in the form of Python scripts, automating what would otherwise be time-consuming tasks. We will give examples of large-scale analysis results, and discuss the future development of the framework, including the encoding and processing of expert performance rules, and the increasing use of performance metadata.