A case for machine learning to optimize multicore performance

Authors:
Archana Ganapathi;Kaushik Datta;Armando Fox;David Patterson
Affiliations:
University of California at Berkeley;University of California at Berkeley;University of California at Berkeley;University of California at Berkeley
Venue:
HotPar'09 Proceedings of the First USENIX conference on Hot topics in parallelism
Year:
2009

Citing 11
Cited 8

High-level optimization via automated statistical modeling

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Kernel Methods for Pattern Analysis

Kernel Methods for Pattern Analysis
Automatic performance tuning of sparse matrix kernels

Automatic performance tuning of sparse matrix kernels
Capturing, indexing, clustering, and retrieving system history

Proceedings of the twentieth ACM symposium on Operating systems principles
Three research challenges at the intersection of machine learning, statistical induction, and systems

HOTOS'05 Proceedings of the 10th conference on Hot Topics in Operating Systems - Volume 10
Rapidly Selecting Good Compiler Optimizations using Performance Counters

Proceedings of the International Symposium on Code Generation and Optimization
Optimization of sparse matrix-vector multiplication on emerging multicore platforms

Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Roofline: an insightful visual performance model for multicore architectures

Communications of the ACM - A Direct Path to Dependable Software
Predicting Multiple Metrics for Queries: Better Decisions Enabled by Machine Learning

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Detecting application-level failures in component-based Internet services

IEEE Transactions on Neural Networks

Vrisha: using scaling properties of parallel programs for bug detection and localization

Proceedings of the 20th international symposium on High performance distributed computing
Dynamic resource tuning for flexible core chip multiprocessors

ICA3PP'10 Proceedings of the 10th international conference on Algorithms and Architectures for Parallel Processing - Volume Part II
PerfXplain: debugging MapReduce job performance

Proceedings of the VLDB Endowment
Automatic static feature generation for compiler optimization problems

AI'11 Proceedings of the 24th international conference on Advances in Artificial Intelligence
Oversubscription of computational resources on multicore desktop systems

MSEPT'12 Proceedings of the 2012 international conference on Multicore Software Engineering, Performance, and Tools
Mitigating the compiler optimization phase-ordering problem using machine learning

Proceedings of the ACM international conference on Object oriented programming systems languages and applications
Starchart: hardware and software optimization using recursive partitioning regression trees

PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Gunther: search-based auto-tuning of mapreduce

Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Multicore architectures have become so complex and diverse that there is no obvious path to achieving good performance. Hundreds of code transformations, compiler flags, architectural features and optimization parameters result in a search space that can take many machinemonths to explore exhaustively. Inspired by successes in the systems community, we apply state-of-the-art machine learning techniques to explore this space more intelligently. On 7-point and 27-point stencil code, our technique takes about two hours to discover a configuration whose performance is within 1% of and up to 18% better than that achieved by a human expert. This factor of 2000 speedup over manual exploration of the auto-tuning parameter space enables us to explore optimizations that were previously off-limits. We believe the opportunity for using machine learning in multicore autotuning is even more promising than the successes to date in the systems literature.