Rapidly Selecting Good Compiler Optimizations using Performance Counters

Authors:
John Cavazos;Grigori Fursin;Felix Agakov;Edwin Bonilla;Michael F. P. O'Boyle;Olivier Temam
Affiliations:
University of Edinburgh, UK;ALCHEMY Group, INRIA Futurs and LRI, Paris-Sud University, France;University of Edinburgh, UK;University of Edinburgh, UK;University of Edinburgh, UK;ALCHEMY Group, INRIA Futurs and LRI, Paris-Sud University, France
Venue:
Proceedings of the International Symposium on Code Generation and Optimization
Year:
2007

Citing 28
Cited 35

Neural networks for pattern recognition

Neural networks for pattern recognition
ProfileMe: hardware support for instruction-level profiling on out-of-order processors

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Optimizing for reduced code space using genetic algorithms

Proceedings of the ACM SIGPLAN 1999 workshop on Languages, compilers, and tools for embedded systems
Automatically tuned linear algebra software

SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Theory of Information and Coding

Theory of Information and Coding
Adaptive Optimizing Compilers for the 21st Century

The Journal of Supercomputing
A Machine Learning Approach to Automatic Production of Compiler Heuristics

AIMSA '02 Proceedings of the 10th International Conference on Artificial Intelligence: Methodology, Systems, and Applications
Compiler optimization-space exploration

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
A comparison of empirical and model-driven optimization

PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Meta optimization: improving compiler heuristics with machine learning

PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Combined Selection of Tile Sizes and Unroll Factors Using Iterative Compilation

PACT '00 Proceedings of the 2000 International Conference on Parallel Architectures and Compilation Techniques
Fast searches for effective optimization phase sequences

Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
Inducing heuristics to decide whether to schedule

Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
Predicting Unroll Factors Using Supervised Classification

Proceedings of the international symposium on Code generation and optimization
A Model-Based Framework: An Approach for Profit-Driven Optimization

Proceedings of the international symposium on Code generation and optimization
Towards a Systematic, Pragmatic and Architecture-Aware Program Optimization Process for Complex Processors

Proceedings of the 2004 ACM/IEEE conference on Supercomputing
ACME: adaptive compilation made efficient

LCTES '05 Proceedings of the 2005 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Probabilistic source-level optimisation of embedded programs

LCTES '05 Proceedings of the 2005 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Statistical Models for Empirical Search-Based Performance Tuning

International Journal of High Performance Computing Applications
Online performance analysis by statistical sampling of microprocessor performance counters

Proceedings of the 19th annual international conference on Supercomputing
Think globally, search locally

Proceedings of the 19th annual international conference on Supercomputing
Automatic Selection of Compiler Options Using Non-parametric Inferential Statistics

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Using Machine Learning to Focus Iterative Optimization

Proceedings of the International Symposium on Code Generation and Optimization
Exhaustive Optimization Phase Order Space Exploration

Proceedings of the International Symposium on Code Generation and Optimization
Fast and Effective Orchestration of Compiler Optimizations for Automatic Performance Tuning

Proceedings of the International Symposium on Code Generation and Optimization
MiBench: A free, commercially representative embedded benchmark suite

WWC '01 Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop
Fast, automatic, procedure-level performance tuning

Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Method-specific dynamic compilation using logistic regression

Proceedings of the 21st annual ACM SIGPLAN conference on Object-oriented programming systems, languages, and applications

Phase-based adaptive recompilation in a JVM

Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization
Cole: compiler optimization level exploration

Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization
Exploring and predicting the architecture/optimising compiler co-design space

CASES '08 Proceedings of the 2008 international conference on Compilers, architectures and synthesis for embedded systems
Mapping parallelism to multi-cores: a machine learning based approach

Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Collective Optimization

HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
Scenario Based Optimization: A Framework for Statically Enabling Online Optimizations

Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization
OptiScope: Performance Accountability for Optimizing Compilers

Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization
Portable compiler optimisation across embedded programs and microarchitectures using machine learning

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Variant-based competitive parallel execution of sequential programs

Proceedings of the 7th ACM international conference on Computing frontiers
Evaluating iterative optimization across 1000 datasets

PLDI '10 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation
Workload characterization supporting the development of domain-specific compiler optimizations using decision trees for data mining

Proceedings of the 13th International Workshop on Software & Compilers for Embedded Systems
A case for machine learning to optimize multicore performance

HotPar'09 Proceedings of the First USENIX conference on Hot topics in parallelism
Practical aggregation of semantical program properties for machine learning based optimization

CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
Collective optimization: A practical collaborative approach

ACM Transactions on Architecture and Code Optimization (TACO)
Support of Android lab modules for embedded system curriculum

WESE '10 Proceedings of the 2010 Workshop on Embedded Systems Education
Efficiently exploring compiler optimization sequences with pairwise pruning

Proceedings of the 1st International Workshop on Adaptive Self-Tuning Computing Systems for the Exaflop Era
Loaf: a framework and infrastructure for creating online adaptive solutions

Proceedings of the 1st International Workshop on Adaptive Self-Tuning Computing Systems for the Exaflop Era
Understanding stencil code performance on multicore architectures

Proceedings of the 8th ACM International Conference on Computing Frontiers
An evaluation of different modeling techniques for iterative compilation

CASES '11 Proceedings of the 14th international conference on Compilers, architectures and synthesis for embedded systems
Using machine learning to improve automatic vectorization

ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Iterative optimization for the data center

ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Reducing training time in a one-shot machine learning-based compiler

LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
Exploring and Predicting the Effects of Microarchitectural Parameters and Compiler Optimizations on Performance and Energy

ACM Transactions on Embedded Computing Systems (TECS)
Predictive modeling in a polyhedral optimization space

CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
Using machines to learn method-specific compilation strategies

CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
Selective search of inlining vectors for program optimization

Proceedings of the 9th conference on Computing Frontiers
Using graph-based program characterization for predictive modeling

Proceedings of the Tenth International Symposium on Code Generation and Optimization
Parallel iterative compilation: using MapReduce to speedup machine learning in compilers

Proceedings of third international workshop on MapReduce and its Applications Date
THeME: a system for testing by hardware monitoring events

Proceedings of the 2012 International Symposium on Software Testing and Analysis
Deconstructing iterative optimization

ACM Transactions on Architecture and Code Optimization (TACO)
Continuous learning of compiler heuristics

ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Finding good optimization sequences covering program space

ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
On the determination of inlining vectors for program optimization

CC'13 Proceedings of the 22nd international conference on Compiler Construction
Hybrid type legalization for a sparse SIMD instruction set

ACM Transactions on Architecture and Code Optimization (TACO)
Tools for machine-learning-based empirical autotuning and specialization

International Journal of High Performance Computing Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Applying the right compiler optimizations to a particular program can have a significant impact on program performance. Due to the non-linear interaction of compiler optimizations, however, determining the best setting is nontrivial. There have been several proposed techniques that search the space of compiler options to find good solutions; however such approaches can be expensive. This paper proposes a different approach using performance counters as a means of determining good compiler optimization settings. This is achieved by learning a model off-line which can then be used to determine good settings for any new program. We show that such an approach outperforms the state-ofthe- art and is two orders of magnitude faster on average. Furthermore, we show that our performance counter-based approach outperforms techniques based on static code features. Using our technique we achieve a 17% improvement over the highest optimization setting of the commercial PathScale EKOPath 2.3.1 optimizing compiler on the SPEC benchmark suite on a recent AMD Athlon 64 3700+ platform.