Raced profiles: efficient selection of competing compiler optimizations

Authors:
Hugh Leather;Michael O'Boyle;Bruce Worton
Affiliations:
University of Edinburgh, Edinburgh, United Kingdom;University of Edinburgh, Edinburgh, United Kingdom;University of Edinburgh, Edinburgh, United Kingdom
Venue:
Proceedings of the 2009 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Year:
2009

Citing 11
Cited 3

The Racing Algorithm: Model Selection for Lazy Learners

Artificial Intelligence Review - Special issue on lazy learning
Learning to schedule straight-line code

NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
A Machine Learning Approach to Automatic Production of Compiler Heuristics

AIMSA '02 Proceedings of the 10th International Conference on Artificial Intelligence: Methodology, Systems, and Applications
Finding effective optimization phase sequences

Proceedings of the 2003 ACM SIGPLAN conference on Language, compiler, and tool for embedded systems
Meta optimization: improving compiler heuristics with machine learning

PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Finding effective compilation sequences

Proceedings of the 2004 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Predicting Unroll Factors Using Supervised Classification

Proceedings of the international symposium on Code generation and optimization
Using Machine Learning to Focus Iterative Optimization

Proceedings of the International Symposium on Code Generation and Optimization
The DaCapo benchmarks: java benchmarking development and analysis

Proceedings of the 21st annual ACM SIGPLAN conference on Object-oriented programming systems, languages, and applications
Statistically rigorous java performance evaluation

Proceedings of the 22nd annual ACM SIGPLAN conference on Object-oriented programming systems and applications
Producing wrong data without doing anything obviously wrong!

Proceedings of the 14th international conference on Architectural support for programming languages and operating systems

Approximating Pareto optimal compiler optimization sequences—a trade-off between WCET, ACET and code size

Software—Practice & Experience
Optimizing interpreters by tuning opcode orderings on virtual machines for modern architectures: or: how I learned to stop worrying and love hill climbing

Proceedings of the 9th International Conference on Principles and Practice of Programming in Java
S-Race: a multi-objective racing algorithm

Proceedings of the 15th annual conference on Genetic and evolutionary computation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many problems in embedded compilation require one set of optimizations to be selected over another based on run time performance. Self-tuned libraries, iterative compilation and machine learning techniques all compare multiple compiled program versions. In each, program versions are timed to determine which has the best performance. The program needs to be run multiple times for each version because there is noise inherent in most performance measurements. The number of runs must be enough to compare different versions, despite the noise, but executing more than this will waste time and energy. The compiler writer must either risk taking too few runs, potentially getting incorrect results, or taking too many runs increasing the time for their experiments or reducing the number of program versions evaluated. Prior works choose constant size sampling plans where each compiled version is executed a fixed number of times without regard to the level of noise. In this paper we develop a sequential sampling plan which can automatically adapt to the experiment so that the compiler writer can have both confidence in the results and also be sure that no more runs were taken than were needed. We show that our system is able to correctly determine the best optimization settings with between 76% and 87% fewer runs than needed by a brute force, constant sampling size approach. We also compare our approach to JavaSTATS(10); we needed 77% to 89% fewer runs than it needed.