ACM Transactions on Mathematical Software (TOMS)
The programming language Oberon
Software—Practice & Experience
PLDI '89 Proceedings of the ACM SIGPLAN 1989 Conference on Programming language design and implementation
Instruction scheduling for the IBM RISC System/6000 processor
IBM Journal of Research and Development
Profile guided code positioning
PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
A data locality optimizing algorithm
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Efficiently computing static single assignment form and the control dependence graph
ACM Transactions on Programming Languages and Systems (TOPLAS)
Comparing static and dynamic code scheduling for multiple-instruction-issue processors
MICRO 24 Proceedings of the 24th annual international symposium on Microarchitecture
Using profile information to assist classic code optimizations
Software—Practice & Experience
Profile-guided automatic inline expansion for C programs
Software—Practice & Experience
The design and implementation of the self compiler, an optimizing compiler for object-oriented programming languages
Design and evaluation of a compiler algorithm for prefetching
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Optimizing dynamically-dispatched calls with run-time type feedback
PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Towards better inlining decisions using inlining trials
LFP '94 Proceedings of the 1994 ACM conference on LISP and functional programming
Optimally profiling and tracing programs
ACM Transactions on Programming Languages and Systems (TOPLAS)
Profile-assisted instruction scheduling
International Journal of Parallel Programming
Design patterns: elements of reusable object-oriented software
Design patterns: elements of reusable object-oriented software
Combining analyses, combining optimizations
ACM Transactions on Programming Languages and Systems (TOPLAS)
Supporting dynamic data structures on distributed-memory machines
ACM Transactions on Programming Languages and Systems (TOPLAS)
Optimizing ML with run-time code generation
PLDI '96 Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation
Reconciling responsiveness with performance in pure object-oriented languages
ACM Transactions on Programming Languages and Systems (TOPLAS)
C: a language for high-level, efficient, and machine-independent dynamic code generation
POPL '96 Proceedings of the 23rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Accurate and practical profile-driven compilation using the profile buffer
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Profile-driven instruction level parallel scheduling with application to super blocks
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Speculative hedge: regulating compile-time speculation against profile variations
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
New faster Kernighan-Lin-type graph-partitioning algorithms
ICCAD '93 Proceedings of the 1993 IEEE/ACM international conference on Computer-aided design
DAISY: dynamic compilation for 100% architectural compatibility
Proceedings of the 24th annual international symposium on Computer architecture
Communications of the ACM
Continuous profiling: where have all the cycles gone?
Proceedings of the sixteenth ACM symposium on Operating systems principles
System support for automatic profiling and optimization
Proceedings of the sixteenth ACM symposium on Operating systems principles
ProfileMe: hardware support for instruction-level profiling on out-of-order processors
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Procedure placement using temporal ordering information
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
DIGITAL FX!32: combining emulation and binary translation
Digital Technical Journal
Edge profiling versus path profiling: the showdown
POPL '98 Proceedings of the 25th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Fast, effective code generation in a just-in-time Java compiler
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Advanced compiler design and implementation
Advanced compiler design and implementation
Using generational garbage collection to implement cache-conscious data placement
Proceedings of the 1st international symposium on Memory management
Better global scheduling using path profiles
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Cache-conscious data placement
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Cache-conscious structure layout
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Cache-conscious structure definition
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Efficient incremental run-time specialization for free
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
An evaluation of staged run-time optimizations in DyC
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs
SIAM Journal on Scientific Computing
Proceedings of the 14th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Proceedings of the 14th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Automated data-member layout of heap objects to improve memory-hierarchy performance
ACM Transactions on Programming Languages and Systems (TOPLAS)
A framework for reducing the cost of instrumented code
Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
Dynamic Binary Translation and Optimization
IEEE Transactions on Computers
A dynamic optimization framework for a Java just-in-time compiler
OOPSLA '01 Proceedings of the 16th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
The Java Language Specification
The Java Language Specification
Optimizing Dynamically-Typed Object-Oriented Languages With Polymorphic Inline Caches
ECOOP '91 Proceedings of the European Conference on Object-Oriented Programming
Efficient implementation of the smalltalk-80 system
POPL '84 Proceedings of the 11th ACM SIGACT-SIGPLAN symposium on Principles of programming languages
Improving Cache Behavior of Dynamically Allocated Data Structures
PACT '98 Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques
LaTTe: A Java VM Just-in-Time Compiler with Fast and Efficient Register Allocation
PACT '99 Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques
Adaptive systems for the dynamic run-time optimization of programs.
Adaptive systems for the dynamic run-time optimization of programs.
Continuous program optimization
Continuous program optimization
The Accuracy of Initial Prediction in Two-Phase Dynamic Binary Translators
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Proceedings of the 32nd annual international symposium on Computer Architecture
Visualization and analysis of phased behavior in Java programs
Proceedings of the 3rd international symposium on Principles and practice of programming in Java
Supporting software composition at the programming language level
Science of Computer Programming - Special issue on new software composition concepts
Design and evaluation of dynamic optimizations for a Java just-in-time compiler
ACM Transactions on Programming Languages and Systems (TOPLAS)
Statistical Models for Empirical Search-Based Performance Tuning
International Journal of High Performance Computing Applications
Multiple Page Size Modeling and Optimization
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Continuous Path and Edge Profiling
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Power reduction techniques for microprocessor systems
ACM Computing Surveys (CSUR)
Online Phase Detection Algorithms
Proceedings of the International Symposium on Code Generation and Optimization
Region Monitoring for Local Phase Detection in Dynamic Optimization Systems
Proceedings of the International Symposium on Code Generation and Optimization
Phase-based visualization and analysis of Java programs
Science of Computer Programming - Special issue: Principles and practices of programming in Java (PPPJ 2004)
Improving locality with parallel hierarchical copying GC
Proceedings of the 5th international symposium on Memory management
Performance and environment monitoring for continuous program optimization
IBM Journal of Research and Development
Object and method exploration for embedded systems applications
Proceedings of the 20th annual conference on Integrated circuits and systems design
PEAK—a fast and effective performance tuning system via compiler optimization orchestration
ACM Transactions on Programming Languages and Systems (TOPLAS)
Phase-based adaptive recompilation in a JVM
Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization
Modeling Relations between Inputs and Dynamic Behavior for General Programs
Languages and Compilers for Parallel Computing
CC '09 Proceedings of the 18th International Conference on Compiler Construction: Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2009
Mostly static program partitioning of binary executables
ACM Transactions on Programming Languages and Systems (TOPLAS)
Cross-Input Learning and Discriminative Prediction in Evolvable Virtual Machines
Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization
Using program metadata to support SDT in object-oriented applications
Proceedings of the 4th workshop on the Implementation, Compilation, Optimization of Object-Oriented Languages and Programming Systems
Phase detection using trace compilation
PPPJ '09 Proceedings of the 7th International Conference on Principles and Practice of Programming in Java
Dynamic ADTs: a "don't ask, don't tell" policy for data abstraction
Proceedings of the 2007 International Lisp Conference
MiDataSets: creating the conditions for a more realistic evaluation of Iterative optimization
HiPEAC'07 Proceedings of the 2nd international conference on High performance embedded architectures and compilers
Metaman: system-wide metadata management
Proceedings of the Workshop on Binary Instrumentation and Applications
An input-centric paradigm for program dynamic optimizations
Proceedings of the ACM international conference on Object oriented programming systems languages and applications
A step towards transparent integration of input-consciousness into dynamic program optimizations
Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications
JIT technology with C/C++: Feedback-directed dynamic recompilation for statically compiled languages
ACM Transactions on Architecture and Code Optimization (TACO)
Hi-index | 0.00 |
Much of the software in everyday operation is not making optimal use of the hardware on which it actually runs. Among the reasons for this discrepancy are hardware/software mismatches, modularization overheads introduced by software engineering considerations, and the inability of systems to adapt to users' behaviors.A solution to these problems is to delay code generation until load time. This is the earliest point at which a piece of software can be fine-tuned to the actual capabilities of the hardware on which it is about to be executed, and also the earliest point at wich modularization overheads can be overcome by global optimization.A still better match between software and hardware can be achieved by replacing the already executing software at regular intervals by new versions constructed on-the-fly using a background code re-optimizer. This not only enables the use of live profiling data to guide optimization decisions, but also facilitates adaptation to changing usage patterns and the late addition of dynamic link libraries.This paper presents a system that provides code generation at load-time and continuous program optimization at run-time. First, the architecture of the system is presented. Then, two optimization techniques are discussed that were developed specifically in the context of continuous optimization. The first of these optimizations continually adjusts the storage layouts of dynamic data structures to maximize data cache locality, while the second performs profile-driven instruction re-scheduling to increase instruction-level parallelism. These two optimizations have very different cost/benefit ratios, presented in a series of benchmarks. The paper concludes with an outlook to future research directions and an enumeration of some remaining research problems.The empirical results presented in this paper make a case in favor of continuous optimization, but indicate that it needs to be applied judiciously. In many situations, the costs of dynamic optimizations outweigh their benefit, so that no break-even point is ever reached. In favorable circumstances, on the other hand, speed-ups of over 120% have been observed. It appears as if the main beneficiaries of continuous optimization are shared libraries, which at different times can be optimized in the context of the currently dominant client application.