Hot cold optimization of large Windows/NT applications

Authors:
Robert Cohn;P. Geoffrey Lowney
Affiliations:
Digital Equipment Corporation, Hudson, Massachusetts;Digital Equipment Corporation, Hudson, Massachusetts
Venue:
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Year:
1996

Citing 19
Cited 21

Compilers: principles, techniques, and tools

Compilers: principles, techniques, and tools
Global register allocation at link time

SIGPLAN '86 Proceedings of the 1986 SIGPLAN symposium on Compiler construction
Minimizing register usage penalty at procedure calls

PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
Program optimization for instruction caches

ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Achieving high instruction cache performance with an optimizing compiler

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Profile guided code positioning

PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
Register allocation across procedure and module boundaries

PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
Using profile information to assist classic code optimizations

Software—Practice & Experience
Effective compiler support for predicated execution using the hyperblock

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Link-time optimization of address calculation on a 64-bit architecture

PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Partial dead code elimination

PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
ATOM: a system for building customized program analysis tools

PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Reducing branch costs via branch alignment

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
EEL: machine-independent executable editing

PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
The predictability of branches in libraries

Proceedings of the 28th annual international symposium on Microarchitecture
Region-based compilation: an introduction and motivation

Proceedings of the 28th annual international symposium on Microarchitecture
Delivering binary object modification tools for program tools for program analysis and optimization

Digital Technical Journal
Minimum cost interprocedural register allocation

POPL '96 Proceedings of the 23rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Optimal and near-optimal global register allocations using 0–1 integer programming

Software—Practice & Experience

Interprocedural dataflow analysis in an executable optimizer

Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Call-cost directed register allocation

Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Continuous profiling: where have all the cycles gone?

ACM Transactions on Computer Systems (TOCS)
Continuous profiling: where have all the cycles gone?

Proceedings of the sixteenth ACM symposium on Operating systems principles
System support for automatic profiling and optimization

Proceedings of the sixteenth ACM symposium on Operating systems principles
ProfileMe: hardware support for instruction-level profiling on out-of-order processors

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
A hardware-driven profiling scheme for identifying program hot spots to support runtime optimization

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Procedure placement using temporal-ordering information

ACM Transactions on Programming Languages and Systems (TOPLAS)
Overcoming the challenges to feedback-directed optimization (Keynote Talk)

DYNAMO '00 Proceedings of the ACM SIGPLAN workshop on Dynamic and adaptive compilation and optimization
Speculative Alias Analysis for Executable Code

Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Goal-Directed Value Profiling

CC '01 Proceedings of the 10th International Conference on Compiler Construction
Vacuum packing: extracting hardware-detected program phases for post-link optimization

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Optimization opportunities created by global data reordering

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
An infrastructure for adaptive dynamic optimization

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
A region-based compilation technique for a Java just-in-time compiler

PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Dynamic native optimization of interpreters

Proceedings of the 2003 workshop on Interpreters, virtual machines and emulators
Performance of Runtime Optimization on BLAST

Proceedings of the international symposium on Code generation and optimization
A region-based compilation technique for dynamic compilers

ACM Transactions on Programming Languages and Systems (TOPLAS)
Spike: an optimizer for alpha/NT executables

NT'97 Proceedings of the USENIX Windows NT Workshop on The USENIX Windows NT Workshop 1997
Binary analysis for measurement and attribution of program performance

Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
A binary instrumentation tool for the Blackfin processor

Proceedings of the Workshop on Binary Instrumentation and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

A dynamic instruction trace often contains many unnecessary instructions that are required only by the unexecuted portion of the program. Hot-cold optimization (HCO) is a technique that realizes this performance opportunity. HCO uses profile information to partition each routine into frequently executed (hot) and infrequently executed (cold) parts. Unnecessary operations in the hot portion are removed, and compensation code is added on transitions from hot to cold as needed. We evaluate HCO on a collection of large Windows NT applications. HCO is most effective on the programs that are call intensive and have flat profiles, providing a 3-8% reduction in path length beyond conventional optimization.