Mimic: a fast system/370 simulator
SIGPLAN '87 Papers of the Symposium on Interpreters and interpretive techniques
Shade: a fast instruction-set simulator for execution profiling
SIGMETRICS '94 Proceedings of the 1994 ACM SIGMETRICS conference on Measurement and modeling of computer systems
A retargetable, ultra-fast instruction set simulator
DATE '99 Proceedings of the conference on Design, automation and test in Europe
Dynamo: a transparent dynamic optimization system
PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Partial method compilation using dynamic profile information
OOPSLA '01 Proceedings of the 16th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
A universal technique for fast and flexible instruction-set architecture simulation
Proceedings of the 39th annual Design Automation Conference
FX!32: A Profile-Directed Binary Translator
IEEE Micro
Instruction set compiled simulation: a technique for fast and flexible instruction set simulation
Proceedings of the 40th annual Design Automation Conference
A brief history of just-in-time
ACM Computing Surveys (CSUR)
LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Vertical profiling: understanding the behavior of object-priented applications
OOPSLA '04 Proceedings of the 19th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Pin: building customized program analysis tools with dynamic instrumentation
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
A region-based compilation technique for dynamic compilers
ACM Transactions on Programming Languages and Systems (TOPLAS)
Reducing dynamic compilation overhead by overlapping compilation and execution
ASP-DAC '06 Proceedings of the 2006 Asia and South Pacific Design Automation Conference
HotpathVM: an effective JIT compiler for resource-constrained devices
Proceedings of the 2nd international conference on Virtual execution environments
CODES+ISSS '06 Proceedings of the 4th international conference on Hardware/software codesign and system synthesis
SPEC CPU2006 benchmark descriptions
ACM SIGARCH Computer Architecture News
Ultra fast cycle-accurate compiled emulation of inorder pipelined architectures
Journal of Systems Architecture: the EUROMICRO Journal
QEMU, a fast and portable dynamic translator
ATEC '05 Proceedings of the annual conference on USENIX Annual Technical Conference
YETI: a graduallY extensible trace interpreter
Proceedings of the 3rd international conference on Virtual execution environments
Parallelization of IBM mambo system simulator in functional modes
ACM SIGOPS Operating Systems Review
A parallel dynamic compiler for CIL bytecode
ACM SIGPLAN Notices
High Speed CPU Simulation Using LTU Dynamic Binary Translation
HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
Trace-based just-in-time type specialization for dynamic languages
Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
Native Client: A Sandbox for Portable, Untrusted x86 Native Code
SP '09 Proceedings of the 2009 30th IEEE Symposium on Security and Privacy
Processor virtualization and split compilation for heterogeneous multicore embedded systems
Proceedings of the 47th Design Automation Conference
Trace-based compilation in execution environments without interpreters
Proceedings of the 8th International Conference on the Principles and Practice of Programming in Java
JIT compilation policy for modern machines
Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications
Proceedings of the 13th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, Tools and Theory for Embedded Systems
HQEMU: a multi-threaded and retargetable dynamic binary translator on multicores
Proceedings of the Tenth International Symposium on Code Generation and Optimization
Compilation queuing and graph caching for dynamic compilers
Proceedings of the sixth ACM workshop on Virtual machines and intermediate languages
Limits of region-based dynamic binary parallelization
Proceedings of the 9th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
Improving dynamic binary optimization through early-exit guided code region formation
Proceedings of the 9th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
Proceedings of the 50th Annual Design Automation Conference
Tracing compilation by abstract interpretation
Proceedings of the 41st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages
Exploring single and multilevel JIT compilation policy for modern machines 1
ACM Transactions on Architecture and Code Optimization (TACO)
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
JIT technology with C/C++: Feedback-directed dynamic recompilation for statically compiled languages
ACM Transactions on Architecture and Code Optimization (TACO)
Hi-index | 0.00 |
Dynamic Binary Translation (DBT) is the key technology behind cross-platform virtualization and allows software compiled for one Instruction Set Architecture (ISA) to be executed on a processor supporting a different ISA. Under the hood, DBT is typically implemented using Just-In-Time (JIT) compilation of frequently executed program regions, also called traces. The main challenge is translating frequently executed program regions as fast as possible into highly efficient native code. As time for JIT compilation adds to the overall execution time, the JIT compiler is often decoupled and operates in a separate thread independent from the main simulation loop to reduce the overhead of JIT compilation. In this paper we present two innovative contributions. The first contribution is a generalized trace compilation approach that considers all frequently executed paths in a program for JIT compilation, as opposed to previous approaches where trace compilation is restricted to paths through loops. The second contribution reduces JIT compilation cost by compiling several hot traces in a concurrent task farm. Altogether we combine generalized light-weight tracing, large translation units, parallel JIT compilation and dynamic work scheduling to ensure timely and efficient processing of hot traces. We have evaluated our industry-strength, LLVM-based parallel DBT implementing the ARCompact ISA against three benchmark suites (EEMBC, BioPerf and SPEC CPU2006) and demonstrate speedups of up to 2.08 on a standard quad-core Intel Xeon machine. Across short- and long-running benchmarks our scheme is robust and never results in a slowdown. In fact, using four processors total execution time can be reduced by on average 11.5% over state-of-the-art decoupled, parallel (or asynchronous) JIT compilation.