Dynamo: a transparent dynamic optimization system
PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Secure Execution via Program Shepherding
Proceedings of the 11th USENIX Security Symposium
DELI: a new run-time control point
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Retargetable and reconfigurable software dynamic translation
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Safe Virtual Execution Using Software Dynamic Translation
ACSAC '02 Proceedings of the 18th Annual Computer Security Applications Conference
Code Cache Management Schemes for Dynamic Optimizers
INTERACT '02 Proceedings of the Sixth Annual Workshop on Interaction between Compilers and Computer Architectures
Maintaining Consistency and Bounding Capacity of Software Code Caches
Proceedings of the international symposium on Code generation and optimization
A Dynamic Compilation Framework for Controlling Microprocessor Energy and Performance
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
MiBench: A free, commercially representative embedded benchmark suite
WWC '01 Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop
Evaluating fragment construction policies for SDT systems
Proceedings of the 2nd international conference on Virtual execution environments
Managing bounded code caches in dynamic binary optimization systems
ACM Transactions on Architecture and Code Optimization (TACO)
Software-based instruction caching for embedded processors
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Evaluating Indirect Branch Handling Mechanisms in Software Dynamic Translation Systems
Proceedings of the International Symposium on Code Generation and Optimization
Fragment cache management for dynamic binary translators in embedded systems with scratchpad
CASES '07 Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems
Reducing exit stub memory consumption in code caches
HiPEAC'07 Proceedings of the 2nd international conference on High performance embedded architectures and compilers
Addressing the challenges of DBT for the ARM architecture
Proceedings of the 2009 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Heterogeneous code cache: using scratchpad and main memory in dynamic binary translators
Proceedings of the 46th Annual Design Automation Conference
DBT path selection for holistic memory efficiency and performance
Proceedings of the 6th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
Balancing memory and performance through selective flushing of software code caches
CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
Memory optimization of dynamic binary translators for embedded systems
ACM Transactions on Architecture and Code Optimization (TACO)
Effective code discovery for ARM/Thumb mixed ISA binaries in a static binary translator
Proceedings of the 2013 International Conference on Compilers, Architectures and Synthesis for Embedded Systems
Hi-index | 0.00 |
Dynamic binary translators (DBT) have recently attracted much attention for embedded systems. The effective implementation of DBT in these systems is challenging due to tight constraints on memory and performance. A DBT uses a software-managed code cache to hold blocks of translated code. To minimize overhead, the code cache is usually large so blocks are translated once and never discarded. However, an embedded system may lack the resources for a large code cache. This constraint leads to significant slowdowns due to the retranslation of blocks prematurely discarded from a small code cache. This paper addresses the problem and shows how to impose a tight size bound on the code cache without performance loss. We show that about 70% of the code cache is consumed by instructions that the DBT introduces for its own purposes. Based on this observation, we propose novel techniques that reduce the amount of space required by DBT-injected code, leaving more room for actual application code and improving the miss ratio. We experimentally demonstrate that a bounded code cache can have performance on-par with an unbounded one.