ADP: automated diagnosis of performance pathologies using hardware events

Authors:
Wucherl Yoo;Kevin Larson;Lee Baugh;Sangkyum Kim;Roy H. Campbell
Affiliations:
University of Illinois at Urbana-Champaign, Urbana, IL, USA;University of Illinois at Urbana-Champaign, Urbana, IL, USA;Intel, Champaign, IL, USA;University of Illinois at Urbana-Champaign, Urbana, IL, USA;University of Illinois at Urbana-Champaign, Urbana, IL, USA
Venue:
Proceedings of the 12th ACM SIGMETRICS/PERFORMANCE joint international conference on Measurement and Modeling of Computer Systems
Year:
2012

Citing 31
Cited 1

Inferring decision trees using the minimum description length principle

Information and Computation
C4.5: programs for machine learning

C4.5: programs for machine learning
Simultaneous multithreading: maximizing on-chip parallelism

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Bayesian classification (AutoClass): theory and results

Advances in knowledge discovery and data mining
Performance analysis of distributed applications using automatic classification of communication inefficiencies

Proceedings of the 14th international conference on Supercomputing
A scalable cross-platform infrastructure for application performance tuning using hardware counters

Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Random Forests

Machine Learning
Parsec: A Parallel Simulation Environment for Complex Systems

Computer
Induction of Decision Trees

Machine Learning
Correlation-based Feature Selection for Discrete and Numeric Class Machine Learning

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
A Portable Programming Interface for Performance Evaluation on Modern Processors

International Journal of High Performance Computing Applications
Pin: building customized program analysis tools with dynamic instrumentation

Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Toward Scalable Performance Visualization with Jumpshot

International Journal of High Performance Computing Applications
Online performance analysis by statistical sampling of microprocessor performance counters

Proceedings of the 19th annual international conference on Supercomputing
On-line automated performance diagnosis on thousands of processes

Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
The Tau Parallel Performance System

International Journal of High Performance Computing Applications
Mercury and freon: temperature emulation and management for server systems

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
SPEC CPU2006 benchmark descriptions

ACM SIGARCH Computer Architecture News
Online optimizations driven by hardware performance monitoring

Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Hardware counter driven on-the-fly request signatures

Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Energy management for hypervisor-based virtual machines

ATC'07 2007 USENIX Annual Technical Conference on Proceedings of the USENIX Annual Technical Conference
Prediction-Based Power-Performance Adaptation of Multithreaded Scientific Codes

IEEE Transactions on Parallel and Distributed Systems
Coordinated management of multiple interacting resources in chip multiprocessors: A machine learning approach

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Detecting large-scale system problems by mining console logs

Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
Automatic performance analysis with periscope

Concurrency and Computation: Practice & Experience - Scalable Tools for High-End Computing
HPCTOOLKIT: tools for performance analysis of optimized parallel programs http://hpctoolkit.org

Concurrency and Computation: Practice & Experience - Scalable Tools for High-End Computing
Resource-conscious scheduling for energy efficiency on multicore processors

Proceedings of the 5th European conference on Computer systems
Automatic Phase Detection and Structure Extraction of MPI Applications

International Journal of High Performance Computing Applications
PerfExpert: An Easy-to-Use Performance Diagnosis Tool for HPC Applications

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Identifying the Root Causes of Wait States in Large-Scale Parallel Applications

ICPP '10 Proceedings of the 2010 39th International Conference on Parallel Processing
An overview of statistical learning theory

IEEE Transactions on Neural Networks

Detection of false sharing using machine learning

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

Performance characterization of applications' hardware behavior is essential for making the best use of available hardware resources. Modern architectures offer access to many hardware events that are capable of providing information to reveal architectural performance bottlenecks throughout the core and memory hierarchy. These events can provide programmers with unique and powerful insights into the causes of the resource bottlenecks in their applications. However, interpreting these events has been a significant challenge. We present an automated system that uses machine learning to identify an application's performance problems. Our system provides programmers with insights about the performance of their applications while shielding them from the onerous task of digesting hardware events. It uses a decision tree algorithm, random forests on our micro-benchmarks to fingerprint the performance problems. Our system divides a profiled application into functions and automatically classifies each function by the dominant hardware resource bottlenecks. Using the classifications from the hotspot functions, we were able to achieve an average speedup of 1.73 from three applications in the PARSEC benchmark suite. Our system provides programmers with a guideline of where, what, and how to fix the detected performance problems in applications, which would have otherwise required considerable architectural knowledge.