Convergent Compilation Applied to Loop Unrolling

Authors:
Nicholas Nethercote;Doug Burger;Kathryn S. Mckinley
Affiliations:
The University of Texas at Austin,;The University of Texas at Austin,;The University of Texas at Austin,
Venue:
Transactions on High-Performance Embedded Architectures and Compilers I
Year:
2007

Citing 28
Cited 0

Improving register allocation for subscripted variables

PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
Effective compiler support for predicated execution using the hyperblock

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Towards better inlining decisions using inlining trials

LFP '94 Proceedings of the 1994 ACM conference on LISP and functional programming
CRAIG: a practical framework for combining instruction scheduling and register assignment

PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
Parameterized pattern matching: algorithms and applications

Journal of Computer and System Sciences
Learning to schedule straight-line code

NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
Optimizing for reduced code space using genetic algorithms

Proceedings of the ACM SIGPLAN 1999 workshop on Languages, compilers, and tools for embedded systems
Clock rate versus IPC: the end of the road for conventional microarchitectures

Proceedings of the 27th annual international symposium on Computer architecture
Overcoming the challenges to feedback-directed optimization (Keynote Talk)

DYNAMO '00 Proceedings of the ACM SIGPLAN workshop on Dynamic and adaptive compilation and optimization
Impact of economics on compiler optimization

Proceedings of the 2001 joint ACM-ISCOPE conference on Java Grande
Optimal integrated code generation for clustered VLIW architectures

Proceedings of the joint conference on Languages, compilers and tools for embedded systems: software and compilers for embedded systems
A design space evaluation of grid processor architectures

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
A comparison of empirical and model-driven optimization

PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Meta optimization: improving compiler heuristics with machine learning

PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
CARS: A New Code Generation Framework for Clustered ILP Processors

HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Inducing heuristics to decide whether to schedule

Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
Finding effective compilation sequences

Proceedings of the 2004 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Scaling to the End of Silicon with EDGE Architectures

Computer
The Jalapeño virtual machine

IBM Systems Journal
Predicting Unroll Factors Using Supervised Classification

Proceedings of the international symposium on Code generation and optimization
Automatic Selection of Compiler Options Using Non-parametric Inferential Statistics

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
A Distributed Control Path Architecture for VLIW Processors

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Improving virtual machine performance using a cross-run profile repository

OOPSLA '05 Proceedings of the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Compiling for EDGE Architectures

Proceedings of the International Symposium on Code Generation and Optimization
Using Machine Learning to Focus Iterative Optimization

Proceedings of the International Symposium on Code Generation and Optimization
Online performance auditing: using hot optimizations without getting burned

Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
Trace Scheduling: A Technique for Global Microcode Compaction

IEEE Transactions on Computers
Hybrid optimizations: which optimization algorithm to use?

CC'06 Proceedings of the 15th international conference on Compiler Construction

Quantified Score

Hi-index	0.00

Visualization

Abstract

Well-engineered compilers use a carefully selected set of optimizations, heuristic optimization policies, and a phase ordering. Designing a single optimization heuristic that works well with other optimization phases is a challenging task. Although compiler designers evaluate heuristics and phase orderings before deployment, compilers typically do not statically evaluate nor refine the quality of their optimization decisions during a specific compilation.This paper identifies a class of optimizations for which the compiler can statically evaluate the effectiveness of its heuristics and phase interactions. When necessary, it then modifies and reapplies its optimization policies. We call this approach convergent compilation, since it iterates to converge on high quality code. This model incurs additional compilation time to avoid some of the difficulties of predicting phase interactions and perfecting heuristicsThis work was motivated by the TRIPS architecture which has resource constraints that have conflicting phase order requirements. For example, each atomic execution unit (a TRIPS block) has a maximum number of instructions (128) and a fixed minimum execution time cost. Loop unrolling and other optimizations thus seek to maximize the number of mostly full blocks. Because unrolling enables many downstream optimizations, it needs to occur well before code generation, but this position makes it impossible to accurately predict the final number of instructions. After the compiler generates code, it knows the exact instruction count and consequently if it unrolled too much or too little or just right. If necessary, convergent unrolling then goes back and adjusts the unroll amount accordingly and reapplies subsequent optimization phases. We implement convergent unrolling which automatically matches the best hand unrolled version for a set of microbenchmarks on the TRIPS architectural simulator.Convergent compilation can help solve other phase ordering and heuristic tuning compilation challenges. It is particularly well suited for resource constraints that the compiler can statically evaluate such as register usage, instruction level parallelism, and code size. More importantly, these resource constraints are key performance indicators in embedded, VLIW, and partitioned hardware and indicate that convergent compilation should be broadly applicable.