Global register allocation at link time
SIGPLAN '86 Proceedings of the 1986 SIGPLAN symposium on Compiler construction
Interprocedual optimization: experimental results
Software—Practice & Experience
Readings in object-oriented database systems
Readings in object-oriented database systems
Profile guided code positioning
PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
Managing interprocedural optimization
Managing interprocedural optimization
Interprocedural optimization: eliminating unnecessary recompilation
ACM Transactions on Programming Languages and Systems (TOPLAS)
Developing an interprocedural optimizing compiler
ACM SIGPLAN Notices
Automatic isolation of compiler errors
ACM Transactions on Programming Languages and Systems (TOPLAS)
Object databases: the essentials
Object databases: the essentials
Profile-guided receiver class prediction
Proceedings of the tenth annual conference on Object-oriented programming systems, languages, and applications
Interprocedural dataflow analysis in an executable optimizer
Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Compiler Optimizations for the PA-8000
COMPCON '97 Proceedings of the 42nd IEEE International Computer Conference
Improving UNIX kernel performance using profile based optimization
WTEC'94 Proceedings of the USENIX Winter 1994 Technical Conference on USENIX Winter 1994 Technical Conference
Overcoming the challenges to feedback-directed optimization (Keynote Talk)
DYNAMO '00 Proceedings of the ACM SIGPLAN workshop on Dynamic and adaptive compilation and optimization
A sparse algorithm for predicated global value numbering
PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
Automatic pool allocation for disjoint data structures
Proceedings of the 2002 workshop on Memory system performance
Dynamic trace selection using performance monitoring hardware sampling
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
LLVA: A Low-level Virtual Instruction Set Architecture
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
SYZYGY - A Framework for Scalable Cross-Module IPO
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Scalable High Performance Cross-Module Inlining
Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Inlining java native calls at runtime
Proceedings of the 1st ACM/USENIX international conference on Virtual execution environments
Automatic pool allocation: improving performance by controlling data structure layout in the heap
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Inline Analysis: Beyond Selection Heuristics
Proceedings of the International Symposium on Code Generation and Optimization
Lightweight feedback-directed cross-module optimization
Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
Hi-index | 0.00 |
Large applications are typically partitioned into separately compiled modules. Large performance gains in these applications are available by optimizing across module boundaries. One barrier to applying crossmodule optimization (CMO) to large applications is the potentially enormous amount of time and space consumed by the optimization process.We describe a framework for scalable CMO that provides large gains in performance on applications that contain millions of lines of code. Two major techniques are described. First, careful management of in-memory data structures results in sub-linear memory occupancy when compared to the number of lines of code being optimized. Second, profile data is used to focus optimization effort on the performance-critical portions of applications. We also present practical issues that arise in deploying this framework in a production environment. These issues include debuggability and compatibility with existing development tools, such as make. Our framework is deployed in Hewlett-Packard's (HP) UNIX compiler products and speeds up shipped independent software vendors' applications by as much as 71%.