Simple, fast, and practical non-blocking and blocking concurrent queue algorithms
PODC '96 Proceedings of the fifteenth annual ACM symposium on Principles of distributed computing
Dynamo: a transparent dynamic optimization system
PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Software profiling for hot path prediction: less is more
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
FX!32: A Profile-Directed Binary Translator
IEEE Micro
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Pin: building customized program analysis tools with dynamic instrumentation
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Efficient, transparent, and comprehensive runtime code manipulation
Efficient, transparent, and comprehensive runtime code manipulation
Improving Region Selection in Dynamic Optimization Systems
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
LIFT: A Low-Overhead Practical Information Flow Tracking System for Detecting Security Attacks
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
HDTrans: an open source, low-level dynamic instrumentation system
Proceedings of the 2nd international conference on Virtual execution environments
QEMU, a fast and portable dynamic translator
ATEC '05 Proceedings of the annual conference on USENIX Annual Technical Conference
Valgrind: a framework for heavyweight dynamic binary instrumentation
Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Trace fragment selection within method-based JVMs
Proceedings of the fourth ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
Scalable support for multithreaded applications on dynamic binary instrumentation systems
Proceedings of the 2009 international symposium on Memory management
Taming hardware event samples for FDO compilation
Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
COREMU: a scalable and portable parallel full-system emulator
Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Improving the performance of trace-based systems by false loop filtering
Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
Generalized just-in-time trace compilation using a parallel task farm in a dynamic binary translator
Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
Scalable deterministic replay in a parallel full-system emulator
Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
SPIRE: improving dynamic binary translation through SPC-indexed indirect branch redirecting
Proceedings of the 9th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
Improving dynamic binary optimization through early-exit guided code region formation
Proceedings of the 9th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
Trace construction using enhanced performance monitoring
Proceedings of the ACM International Conference on Computing Frontiers
DBILL: an efficient and retargetable dynamic binary instrumentation framework using llvm backend
Proceedings of the 10th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
Hi-index | 0.00 |
Dynamic binary translation (DBT) is a core technology to many important applications such as system virtualization, dynamic binary instrumentation and security. However, there are several factors that often impede its performance: (1) emulation overhead before translation; (2) translation and optimization overhead, and (3) translated code quality. On the dynamic binary translator itself, the issues also include its retargetability to support guest applications from different instruction-set architectures (ISAs) to host machines also with different ISAs, an important feature for system virtualization. In this work, we take advantage of the ubiquitous multicore platforms, using multithreaded approach to implement DBT. By running the translators and the dynamic binary optimizers on different threads on different cores, it could off-load the overhead caused by DBT on the target applications; thus, afford DBT of more sophisticated optimization techniques as well as the support of its retargetability. Using QEMU (a popular retargetable DBT for system virtualization) and LLVM (Low Level Virtual Machine) as our building blocks, we demonstrated in a multi-threaded DBT prototype, called HQEMU, that it could improve QEMU performance by a factor of 2.4X and 4X on the SPEC 2006 integer and floating point benchmarks for x86 to x86-64 emulations, respectively, i.e. it is only 2.5X and 2.1X slower than native execution of the same benchmarks on x86-64, as opposed to 6X and 8.4X slowdown on QEMU. For ARM to x86-64 emulation, HQEMU could gain a factor of 2.4X speedup over QEMU for the SPEC 2006 integer benchmarks.