Reducing training time in a one-shot machine learning-based compiler

Authors:
John Thomson;Michael O'Boyle;Grigori Fursin;Björn Franke
Affiliations:
University of Edinburgh, UK;University of Edinburgh, UK;INRIA Sarclay, France;University of Edinburgh, UK
Venue:
LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
Year:
2009

Citing 5
Cited 2

Finding effective optimization phase sequences

Proceedings of the 2003 ACM SIGPLAN conference on Language, compiler, and tool for embedded systems
Meta optimization: improving compiler heuristics with machine learning

PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Evaluating Heuristic Optimization Phase Order Search Algorithms

Proceedings of the International Symposium on Code Generation and Optimization
Rapidly Selecting Good Compiler Optimizations using Performance Counters

Proceedings of the International Symposium on Code Generation and Optimization
Workload Reduction for Multi-input Feedback-Directed Optimization

Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization

Continuous learning of compiler heuristics

ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Finding good optimization sequences covering program space

ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers

Quantified Score

Hi-index	0.00

Visualization

Abstract

Iterative compilation of applications has proved a popular and successful approach to achieving high performance. This however, is at the cost of many runs of the application. Machine learning based approaches overcome this at the expense of a large off-line training cost. This paper presents a new approach to dramatically reduce the training time of a machine learning based compiler. This is achieved by focusing on the programs which best characterize the optimization space. By using unsupervised clustering in the program feature space we are able to dramatically reduce the amount of time required to train a compiler. Furthermore, we are able to learn a model which dispenses with iterative search completely allowing integration within the normal program development cycle. We evaluated our clustering approach on the EEMBCv2 benchmark suite and show that we can reduce the number of training runs by more than a factor of 7. This translates into an average 1.14 speedup across the benchmark suite compared to the default highest optimization level.