The superblock: an effective technique for VLIW and superscalar compilation
The Journal of Supercomputing - Special issue on instruction-level parallelism
DAISY: dynamic compilation for 100% architectural compatibility
Proceedings of the 24th annual international symposium on Computer architecture
ProfileMe: hardware support for instruction-level profiling on out-of-order processors
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
DIGITAL FX!32: combining emulation and binary translation
Digital Technical Journal
Proceedings of the 27th annual international symposium on Computer architecture
Adaptive optimization in the Jalapeño JVM
OOPSLA '00 Proceedings of the 15th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Dynamic Binary Translation and Optimization
IEEE Transactions on Computers
An Architectural Framework for Runtime Optimization
IEEE Transactions on Computers
rePLay: A Hardware Framework for Dynamic Optimization
IEEE Transactions on Computers
FX!32: A Profile-Directed Binary Translator
IEEE Micro
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Dynamic binary translation for accumulator-oriented architectures
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
An infrastructure for adaptive dynamic optimization
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Microprogrammed implementation of a single chip microprocessor
MICRO 11 Proceedings of the 11th annual workshop on Microprogramming
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Hardware Support for Control Transfers in Code Caches
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Using Dynamic Binary Translation to Fuse Dependent Instructions
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Power Awareness through Selective Dynamically Optimized Traces
Proceedings of the 31st annual international symposium on Computer architecture
VEAL: Virtualized Execution Accelerator for Loops
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
SoftHV: a HW/SW co-designed processor with horizontal and vertical fusion
Proceedings of the 8th ACM International Conference on Computing Frontiers
DDGacc: boosting dynamic DDG-based binary optimizations through specialized hardware support
VEE '12 Proceedings of the 8th ACM SIGPLAN/SIGOPS conference on Virtual Execution Environments
CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
Warm-Up Simulation Methodology for HW/SW Co-Designed Processors
Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization
Hi-index | 0.00 |
A Co-Designed Virtual Machine allows designers to implement a processor via a combination of hardware and software. Dynamic binary translation converts code written for a conventional (legacy) ISA into optimized code for an underlying implementation-specific ISA. Because translation is done dynamically, an important consideration in such systems is the startup time for performing the initial translations. Beginning with a previously proposed co-designed VM that implements the x86 ISA, we study runtime binary translation overhead effects. The co-designed x86 virtual machine is based on an adaptive translation system that uses a basic block translator for initial emulation and a superblock translator for hotspot optimization. We analyze and model VM startup performance via simulation. We observe that non-hotspot emulation via basic block translation is the major part of the startup overhead. To reduce startup translation overhead, we follow the co-designed hardware / software philosophy and propose hardware assists to dramatically accelerate basic block translations. By combining hardware assists with balanced translation strategies, the co-designed translation system reduces runtime overhead significantly and demonstrates very competitive startup performance when compared with conventional processors running a set of Windows application benchmarks.