Inferring decision trees using the minimum description length principle
Information and Computation
C4.5: programs for machine learning
C4.5: programs for machine learning
Simultaneous multithreading: maximizing on-chip parallelism
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Bayesian classification (AutoClass): theory and results
Advances in knowledge discovery and data mining
Proceedings of the 14th international conference on Supercomputing
A scalable cross-platform infrastructure for application performance tuning using hardware counters
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Machine Learning
Machine Learning
Correlation-based Feature Selection for Discrete and Numeric Class Machine Learning
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
A Portable Programming Interface for Performance Evaluation on Modern Processors
International Journal of High Performance Computing Applications
Pin: building customized program analysis tools with dynamic instrumentation
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Toward Scalable Performance Visualization with Jumpshot
International Journal of High Performance Computing Applications
Online performance analysis by statistical sampling of microprocessor performance counters
Proceedings of the 19th annual international conference on Supercomputing
On-line automated performance diagnosis on thousands of processes
Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
The Tau Parallel Performance System
International Journal of High Performance Computing Applications
Mercury and freon: temperature emulation and management for server systems
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
SPEC CPU2006 benchmark descriptions
ACM SIGARCH Computer Architecture News
Online optimizations driven by hardware performance monitoring
Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Hardware counter driven on-the-fly request signatures
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Energy management for hypervisor-based virtual machines
ATC'07 2007 USENIX Annual Technical Conference on Proceedings of the USENIX Annual Technical Conference
Prediction-Based Power-Performance Adaptation of Multithreaded Scientific Codes
IEEE Transactions on Parallel and Distributed Systems
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Detecting large-scale system problems by mining console logs
Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
Automatic performance analysis with periscope
Concurrency and Computation: Practice & Experience - Scalable Tools for High-End Computing
HPCTOOLKIT: tools for performance analysis of optimized parallel programs http://hpctoolkit.org
Concurrency and Computation: Practice & Experience - Scalable Tools for High-End Computing
Resource-conscious scheduling for energy efficiency on multicore processors
Proceedings of the 5th European conference on Computer systems
Automatic Phase Detection and Structure Extraction of MPI Applications
International Journal of High Performance Computing Applications
PerfExpert: An Easy-to-Use Performance Diagnosis Tool for HPC Applications
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Identifying the Root Causes of Wait States in Large-Scale Parallel Applications
ICPP '10 Proceedings of the 2010 39th International Conference on Parallel Processing
An overview of statistical learning theory
IEEE Transactions on Neural Networks
Detection of false sharing using machine learning
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Hi-index | 0.00 |
Performance characterization of applications' hardware behavior is essential for making the best use of available hardware resources. Modern architectures offer access to many hardware events that are capable of providing information to reveal architectural performance bottlenecks throughout the core and memory hierarchy. These events can provide programmers with unique and powerful insights into the causes of the resource bottlenecks in their applications. However, interpreting these events has been a significant challenge. We present an automated system that uses machine learning to identify an application's performance problems. Our system provides programmers with insights about the performance of their applications while shielding them from the onerous task of digesting hardware events. It uses a decision tree algorithm, random forests on our micro-benchmarks to fingerprint the performance problems. Our system divides a profiled application into functions and automatically classifies each function by the dominant hardware resource bottlenecks. Using the classifications from the hotspot functions, we were able to achieve an average speedup of 1.73 from three applications in the PARSEC benchmark suite. Our system provides programmers with a guideline of where, what, and how to fix the detected performance problems in applications, which would have otherwise required considerable architectural knowledge.