Finding effective optimization phase sequences
Proceedings of the 2003 ACM SIGPLAN conference on Language, compiler, and tool for embedded systems
Meta optimization: improving compiler heuristics with machine learning
PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Evaluating Heuristic Optimization Phase Order Search Algorithms
Proceedings of the International Symposium on Code Generation and Optimization
Rapidly Selecting Good Compiler Optimizations using Performance Counters
Proceedings of the International Symposium on Code Generation and Optimization
Workload Reduction for Multi-input Feedback-Directed Optimization
Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization
Continuous learning of compiler heuristics
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Finding good optimization sequences covering program space
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Hi-index | 0.00 |
Iterative compilation of applications has proved a popular and successful approach to achieving high performance. This however, is at the cost of many runs of the application. Machine learning based approaches overcome this at the expense of a large off-line training cost. This paper presents a new approach to dramatically reduce the training time of a machine learning based compiler. This is achieved by focusing on the programs which best characterize the optimization space. By using unsupervised clustering in the program feature space we are able to dramatically reduce the amount of time required to train a compiler. Furthermore, we are able to learn a model which dispenses with iterative search completely allowing integration within the normal program development cycle. We evaluated our clustering approach on the EEMBCv2 benchmark suite and show that we can reduce the number of training runs by more than a factor of 7. This translates into an average 1.14 speedup across the benchmark suite compared to the default highest optimization level.