Continuous program optimization: A case study

Authors:
Thomas Kistler;Michael Franz
Affiliations:
University of California, Irvine, CA;University of California, Irvine, CA
Venue:
ACM Transactions on Programming Languages and Systems (TOPLAS)
Year:
2003

Citing 60
Cited 31

Algorithm 656: an extended set of basic linear algebra subprograms: model implementation and test programs

ACM Transactions on Mathematical Software (TOMS)
The programming language Oberon

Software—Practice & Experience
Customization: optimizing compiler technology for SELF, a dynamically-typed object-oriented programming language

PLDI '89 Proceedings of the ACM SIGPLAN 1989 Conference on Programming language design and implementation
Instruction scheduling for the IBM RISC System/6000 processor

IBM Journal of Research and Development
Profile guided code positioning

PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
A data locality optimizing algorithm

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Efficiently computing static single assignment form and the control dependence graph

ACM Transactions on Programming Languages and Systems (TOPLAS)
Comparing static and dynamic code scheduling for multiple-instruction-issue processors

MICRO 24 Proceedings of the 24th annual international symposium on Microarchitecture
Using profile information to assist classic code optimizations

Software—Practice & Experience
Profile-guided automatic inline expansion for C programs

Software—Practice & Experience
The design and implementation of the self compiler, an optimizing compiler for object-oriented programming languages

The design and implementation of the self compiler, an optimizing compiler for object-oriented programming languages
Design and evaluation of a compiler algorithm for prefetching

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Optimizing dynamically-dispatched calls with run-time type feedback

PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Towards better inlining decisions using inlining trials

LFP '94 Proceedings of the 1994 ACM conference on LISP and functional programming
Optimally profiling and tracing programs

ACM Transactions on Programming Languages and Systems (TOPLAS)
Profile-assisted instruction scheduling

International Journal of Parallel Programming
Design patterns: elements of reusable object-oriented software

Design patterns: elements of reusable object-oriented software
Combining analyses, combining optimizations

ACM Transactions on Programming Languages and Systems (TOPLAS)
Supporting dynamic data structures on distributed-memory machines

ACM Transactions on Programming Languages and Systems (TOPLAS)
Optimizing ML with run-time code generation

PLDI '96 Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation
Reconciling responsiveness with performance in pure object-oriented languages

ACM Transactions on Programming Languages and Systems (TOPLAS)
C: a language for high-level, efficient, and machine-independent dynamic code generation

POPL '96 Proceedings of the 23rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Accurate and practical profile-driven compilation using the profile buffer

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Efficient path profiling

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Profile-driven instruction level parallel scheduling with application to super blocks

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Speculative hedge: regulating compile-time speculation against profile variations

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
New faster Kernighan-Lin-type graph-partitioning algorithms

ICCAD '93 Proceedings of the 1993 IEEE/ACM international conference on Computer-aided design
DAISY: dynamic compilation for 100% architectural compatibility

Proceedings of the 24th annual international symposium on Computer architecture
Slim binaries

Communications of the ACM
Continuous profiling: where have all the cycles gone?

Proceedings of the sixteenth ACM symposium on Operating systems principles
System support for automatic profiling and optimization

Proceedings of the sixteenth ACM symposium on Operating systems principles
ProfileMe: hardware support for instruction-level profiling on out-of-order processors

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Procedure placement using temporal ordering information

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
DIGITAL FX!32: combining emulation and binary translation

Digital Technical Journal
Edge profiling versus path profiling: the showdown

POPL '98 Proceedings of the 25th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Fast, effective code generation in a just-in-time Java compiler

PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Advanced compiler design and implementation

Advanced compiler design and implementation
Using generational garbage collection to implement cache-conscious data placement

Proceedings of the 1st international symposium on Memory management
Better global scheduling using path profiles

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Cache-conscious data placement

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Cache-conscious structure layout

Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Cache-conscious structure definition

Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Efficient incremental run-time specialization for free

Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
An evaluation of staged run-time optimizations in DyC

Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs

SIAM Journal on Scientific Computing
Escape analysis for Java

Proceedings of the 14th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Implementing jalapeño in Java

Proceedings of the 14th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Automated data-member layout of heap objects to improve memory-hierarchy performance

ACM Transactions on Programming Languages and Systems (TOPLAS)
A framework for reducing the cost of instrumented code

Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
Dynamic Binary Translation and Optimization

IEEE Transactions on Computers
A dynamic optimization framework for a Java just-in-time compiler

OOPSLA '01 Proceedings of the 16th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
The Java Language Specification

The Java Language Specification
PA-RISC to IA-64: Transparent Execution, No Recompilation

Computer
Dynamic and Transparent Binary Translation

Computer
Optimizing Dynamically-Typed Object-Oriented Languages With Polymorphic Inline Caches

ECOOP '91 Proceedings of the European Conference on Object-Oriented Programming
Efficient implementation of the smalltalk-80 system

POPL '84 Proceedings of the 11th ACM SIGACT-SIGPLAN symposium on Principles of programming languages
Improving Cache Behavior of Dynamically Allocated Data Structures

PACT '98 Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques
LaTTe: A Java VM Just-in-Time Compiler with Fast and Efficient Register Allocation

PACT '99 Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques
Adaptive systems for the dynamic run-time optimization of programs.

Adaptive systems for the dynamic run-time optimization of programs.
Continuous program optimization

Continuous program optimization

The Accuracy of Initial Prediction in Two-Phase Dynamic Binary Translators

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Towards a Systematic, Pragmatic and Architecture-Aware Program Optimization Process for Complex Processors

Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Continuous Optimization

Proceedings of the 32nd annual international symposium on Computer Architecture
Visualization and analysis of phased behavior in Java programs

Proceedings of the 3rd international symposium on Principles and practice of programming in Java
Supporting software composition at the programming language level

Science of Computer Programming - Special issue on new software composition concepts
Design and evaluation of dynamic optimizations for a Java just-in-time compiler

ACM Transactions on Programming Languages and Systems (TOPLAS)
Statistical Models for Empirical Search-Based Performance Tuning

International Journal of High Performance Computing Applications
Multiple Page Size Modeling and Optimization

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Continuous Path and Edge Profiling

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Power reduction techniques for microprocessor systems

ACM Computing Surveys (CSUR)
Online Phase Detection Algorithms

Proceedings of the International Symposium on Code Generation and Optimization
Region Monitoring for Local Phase Detection in Dynamic Optimization Systems

Proceedings of the International Symposium on Code Generation and Optimization
Phase-based visualization and analysis of Java programs

Science of Computer Programming - Special issue: Principles and practices of programming in Java (PPPJ 2004)
Improving locality with parallel hierarchical copying GC

Proceedings of the 5th international symposium on Memory management
Performance and environment monitoring for continuous program optimization

IBM Journal of Research and Development
Object and method exploration for embedded systems applications

Proceedings of the 20th annual conference on Integrated circuits and systems design
PEAK—a fast and effective performance tuning system via compiler optimization orchestration

ACM Transactions on Programming Languages and Systems (TOPLAS)
Phase-based adaptive recompilation in a JVM

Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization
Modeling Relations between Inputs and Dynamic Behavior for General Programs

Languages and Compilers for Parallel Computing
Dynamic Look Ahead Compilation: A Technique to Hide JIT Compilation Latencies in Multicore Environment

CC '09 Proceedings of the 18th International Conference on Compiler Construction: Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2009
Mostly static program partitioning of binary executables

ACM Transactions on Programming Languages and Systems (TOPLAS)
Cross-Input Learning and Discriminative Prediction in Evolvable Virtual Machines

Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization
Using program metadata to support SDT in object-oriented applications

Proceedings of the 4th workshop on the Implementation, Compilation, Optimization of Object-Oriented Languages and Programming Systems
Phase detection using trace compilation

PPPJ '09 Proceedings of the 7th International Conference on Principles and Practice of Programming in Java
Dynamic ADTs: a "don't ask, don't tell" policy for data abstraction

Proceedings of the 2007 International Lisp Conference
MiDataSets: creating the conditions for a more realistic evaluation of Iterative optimization

HiPEAC'07 Proceedings of the 2nd international conference on High performance embedded architectures and compilers
Metaman: system-wide metadata management

Proceedings of the Workshop on Binary Instrumentation and Applications
An input-centric paradigm for program dynamic optimizations

Proceedings of the ACM international conference on Object oriented programming systems languages and applications
A step towards transparent integration of input-consciousness into dynamic program optimizations

Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications
JIT technology with C/C++: Feedback-directed dynamic recompilation for statically compiled languages

ACM Transactions on Architecture and Code Optimization (TACO)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Much of the software in everyday operation is not making optimal use of the hardware on which it actually runs. Among the reasons for this discrepancy are hardware/software mismatches, modularization overheads introduced by software engineering considerations, and the inability of systems to adapt to users' behaviors.A solution to these problems is to delay code generation until load time. This is the earliest point at which a piece of software can be fine-tuned to the actual capabilities of the hardware on which it is about to be executed, and also the earliest point at wich modularization overheads can be overcome by global optimization.A still better match between software and hardware can be achieved by replacing the already executing software at regular intervals by new versions constructed on-the-fly using a background code re-optimizer. This not only enables the use of live profiling data to guide optimization decisions, but also facilitates adaptation to changing usage patterns and the late addition of dynamic link libraries.This paper presents a system that provides code generation at load-time and continuous program optimization at run-time. First, the architecture of the system is presented. Then, two optimization techniques are discussed that were developed specifically in the context of continuous optimization. The first of these optimizations continually adjusts the storage layouts of dynamic data structures to maximize data cache locality, while the second performs profile-driven instruction re-scheduling to increase instruction-level parallelism. These two optimizations have very different cost/benefit ratios, presented in a series of benchmarks. The paper concludes with an outlook to future research directions and an enumeration of some remaining research problems.The empirical results presented in this paper make a case in favor of continuous optimization, but indicate that it needs to be applied judiciously. In many situations, the costs of dynamic optimizations outweigh their benefit, so that no break-even point is ever reached. In favorable circumstances, on the other hand, speed-ups of over 120% have been observed. It appears as if the main beneficiaries of continuous optimization are shared libraries, which at different times can be optimized in the context of the currently dominant client application.