Improving register allocation for subscripted variables
PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
Effective compiler support for predicated execution using the hyperblock
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Towards better inlining decisions using inlining trials
LFP '94 Proceedings of the 1994 ACM conference on LISP and functional programming
CRAIG: a practical framework for combining instruction scheduling and register assignment
PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
Parameterized pattern matching: algorithms and applications
Journal of Computer and System Sciences
Learning to schedule straight-line code
NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
Optimizing for reduced code space using genetic algorithms
Proceedings of the ACM SIGPLAN 1999 workshop on Languages, compilers, and tools for embedded systems
Clock rate versus IPC: the end of the road for conventional microarchitectures
Proceedings of the 27th annual international symposium on Computer architecture
Overcoming the challenges to feedback-directed optimization (Keynote Talk)
DYNAMO '00 Proceedings of the ACM SIGPLAN workshop on Dynamic and adaptive compilation and optimization
Impact of economics on compiler optimization
Proceedings of the 2001 joint ACM-ISCOPE conference on Java Grande
Optimal integrated code generation for clustered VLIW architectures
Proceedings of the joint conference on Languages, compilers and tools for embedded systems: software and compilers for embedded systems
A design space evaluation of grid processor architectures
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
A comparison of empirical and model-driven optimization
PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Meta optimization: improving compiler heuristics with machine learning
PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
CARS: A New Code Generation Framework for Clustered ILP Processors
HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Inducing heuristics to decide whether to schedule
Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
Finding effective compilation sequences
Proceedings of the 2004 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
IBM Systems Journal
Predicting Unroll Factors Using Supervised Classification
Proceedings of the international symposium on Code generation and optimization
Automatic Selection of Compiler Options Using Non-parametric Inferential Statistics
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
A Distributed Control Path Architecture for VLIW Processors
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Improving virtual machine performance using a cross-run profile repository
OOPSLA '05 Proceedings of the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Compiling for EDGE Architectures
Proceedings of the International Symposium on Code Generation and Optimization
Using Machine Learning to Focus Iterative Optimization
Proceedings of the International Symposium on Code Generation and Optimization
Online performance auditing: using hot optimizations without getting burned
Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
Trace Scheduling: A Technique for Global Microcode Compaction
IEEE Transactions on Computers
Hybrid optimizations: which optimization algorithm to use?
CC'06 Proceedings of the 15th international conference on Compiler Construction
Hi-index | 0.00 |
Well-engineered compilers use a carefully selected set of optimizations, heuristic optimization policies, and a phase ordering. Designing a single optimization heuristic that works well with other optimization phases is a challenging task. Although compiler designers evaluate heuristics and phase orderings before deployment, compilers typically do not statically evaluate nor refine the quality of their optimization decisions during a specific compilation.This paper identifies a class of optimizations for which the compiler can statically evaluate the effectiveness of its heuristics and phase interactions. When necessary, it then modifies and reapplies its optimization policies. We call this approach convergent compilation, since it iterates to converge on high quality code. This model incurs additional compilation time to avoid some of the difficulties of predicting phase interactions and perfecting heuristicsThis work was motivated by the TRIPS architecture which has resource constraints that have conflicting phase order requirements. For example, each atomic execution unit (a TRIPS block) has a maximum number of instructions (128) and a fixed minimum execution time cost. Loop unrolling and other optimizations thus seek to maximize the number of mostly full blocks. Because unrolling enables many downstream optimizations, it needs to occur well before code generation, but this position makes it impossible to accurately predict the final number of instructions. After the compiler generates code, it knows the exact instruction count and consequently if it unrolled too much or too little or just right. If necessary, convergent unrolling then goes back and adjusts the unroll amount accordingly and reapplies subsequent optimization phases. We implement convergent unrolling which automatically matches the best hand unrolled version for a set of microbenchmarks on the TRIPS architectural simulator.Convergent compilation can help solve other phase ordering and heuristic tuning compilation challenges. It is particularly well suited for resource constraints that the compiler can statically evaluate such as register usage, instruction level parallelism, and code size. More importantly, these resource constraints are key performance indicators in embedded, VLIW, and partitioned hardware and indicate that convergent compilation should be broadly applicable.