Simple, fast, and practical non-blocking and blocking concurrent queue algorithms
PODC '96 Proceedings of the fifteenth annual ACM symposium on Principles of distributed computing
Dynamo: a transparent dynamic optimization system
PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Dynamic Binary Translation and Optimization
IEEE Transactions on Computers
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
A region-based compilation technique for a Java just-in-time compiler
PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Efficient, transparent, and comprehensive runtime code manipulation
Efficient, transparent, and comprehensive runtime code manipulation
Improving Region Selection in Dynamic Optimization Systems
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Experiences with Multi-threading and Dynamic Class Loading in a Java Just-In-Time Compiler
Proceedings of the International Symposium on Code Generation and Optimization
Virtual Machines: Versatile Platforms for Systems and Processes (The Morgan Kaufmann Series in Computer Architecture and Design)
The java hotspotTM server compiler
JVM'01 Proceedings of the 2001 Symposium on JavaTM Virtual Machine Research and Technology Symposium - Volume 1
Trace-based just-in-time type specialization for dynamic languages
Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
SPUR: a trace-based JIT compiler for CIL
Proceedings of the ACM international conference on Object oriented programming systems languages and applications
Improving the performance of trace-based systems by false loop filtering
Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
Generalized just-in-time trace compilation using a parallel task farm in a dynamic binary translator
Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
Reducing trace selection footprint for large-scale Java applications without performance loss
Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications
LnQ: Building High Performance Dynamic Binary Translators with Existing Compiler Backends
ICPP '11 Proceedings of the 2011 International Conference on Parallel Processing
A trace-based Java JIT compiler retrofitted from a method-based compiler
CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
HQEMU: a multi-threaded and retargetable dynamic binary translator on multicores
Proceedings of the Tenth International Symposium on Code Generation and Optimization
Adaptive multi-level compilation in a trace-based Java JIT compiler
Proceedings of the ACM international conference on Object oriented programming systems languages and applications
StarDBT: an efficient multi-platform dynamic binary translation system
ACSAC'07 Proceedings of the 12th Asia-Pacific conference on Advances in Computer Systems Architecture
Hi-index | 0.00 |
Most dynamic binary translators (DBT) and optimizers (DBO) target binary traces, i.e. frequently executed paths, as code regions to be translated and optimized. Code region formation is the most important first step in all DBTs and DBOs. The quality of the dynamically formed code regions determines the extent and the types of optimization opportunities that can be exposed to DBTs and DBOs, and thus, determines the ultimate quality of the final optimized code. The Next-Executing-Tail (NET) trace formation method used in HP Dynamo is an early example of such techniques. Many existing trace formation schemes are variants of NET. They work very well for most binary traces, but they also suffer a major problem: the formed traces may contain a large number of early exits that could be branched out during the execution. If this happens frequently, the program execution will spend more time in the slow binary interpreter or in the unoptimized code regions than in the optimized traces in code cache. The benefit of the trace optimization is thus lost. Traces/regions with frequently taken early-exits are called delinquent traces/regions. Our empirical study shows that at least 8 of the 12 SPEC CPU2006 integer benchmarks have delinquent traces. In this paper, we propose a light-weight region formation technique called Early-Exit Guided Region Formation (EEG) to improve the quality of the formed traces/regions. It iteratively identifies and merges delinquent regions into larger code regions. We have implemented our EEG algorithm in two LLVM-based multi-threaded DBTs targeting ARM and IA32 instruction set architecture (ISA), respectively. Using SPEC CPU2006 benchmark suite with reference inputs, our results show that compared to an NET-variant currently used in QEMU, a state-of-the-art retargetable DBT, EEG can achieve a significant performance improvement of up to 72% (27% on average), and to 49% (23% on average) for IA32 and ARM, respectively.