Dynamic binary translation for accumulator-oriented architectures

Authors:
Ho-Seop Kim;James E. Smith
Affiliations:
University of Wisconsin--Madison;University of Wisconsin--Madison
Venue:
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Year:
2003

Citing 30
Cited 12

Profile guided code positioning

PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
Branch history table prediction of moving target branches due to subroutine returns

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
The superblock: an effective technique for VLIW and superscalar compilation

The Journal of Supercomputing - Special issue on instruction-level parallelism
Dynamic rescheduling: a technique for object code compatibility in VLIW architectures

Proceedings of the 28th annual international symposium on Microarchitecture
Complexity-effective superscalar processors

Proceedings of the 24th annual international symposium on Computer architecture
The multicluster architecture: reducing cycle time through partitioning

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
DIGITAL FX!32: combining emulation and binary translation

Digital Technical Journal
Software trace cache

ICS '99 Proceedings of the 13th international conference on Supercomputing
Instruction path coprocessors

Proceedings of the 27th annual international symposium on Computer architecture
Dynamo: a transparent dynamic optimization system

PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Overcoming the challenges to feedback-directed optimization (Keynote Talk)

DYNAMO '00 Proceedings of the ACM SIGPLAN workshop on Dynamic and adaptive compilation and optimization
Adaptive optimization in the Jalapeño JVM

OOPSLA '00 Proceedings of the 15th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Threaded code

Communications of the ACM
Dynamic Binary Translation and Optimization

IEEE Transactions on Computers
Continuous Program Optimization: Design and Evaluation

IEEE Transactions on Computers
An Architectural Framework for Runtime Optimization

IEEE Transactions on Computers
An instruction set and microarchitecture for instruction level distributed processing

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Performance characterization of a hardware mechanism for dynamic optimization

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Measuring Experimental Error in Microprocessor Simulation

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
PA-RISC to IA-64: Transparent Execution, No Recompilation

Computer
Dynamic and Transparent Binary Translation

Computer
SPEC CPU2000: Measuring CPU Performance in the New Millennium

Computer
Instruction-Level Distributed Processing

Computer
FX!32: A Profile-Directed Binary Translator

IEEE Micro
The Alpha 21264 Microprocessor

IEEE Micro
Code Reordering and Speculation Support for Dynamic Optimization System

Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
A Lightweight Algorithm for Dynamic If-Conversion during Dynamic Optimization

PACT '00 Proceedings of the 2000 International Conference on Parallel Architectures and Compilation Techniques
A Compact Intermediate Format for SimICS

A Compact Intermediate Format for SimICS
Achieving High Performance via Co-Designed Virtual Machines

IWIA '98 Proceedings of the 1998 International Workshop on Innovative Architecture
POWER4 system microarchitecture

IBM Journal of Research and Development

Hardware Support for Control Transfers in Code Caches

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Using Dynamic Binary Translation to Fuse Dependent Instructions

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Jazzing up JVMs with off-line profile data: does it pay?

ACM SIGPLAN Notices
A Dependency Chain Clustered Microarchitecture

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Static strands: safely collapsing dependence chains for increasing embedded power efficiency

LCTES '05 Proceedings of the 2005 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
NANA: A nano-scale active network architecture

ACM Journal on Emerging Technologies in Computing Systems (JETC)
Reducing Startup Time in Co-Designed Virtual Machines

Proceedings of the 33rd annual international symposium on Computer Architecture
SuperPin: Parallelizing Dynamic Instrumentation for Real-Time Performance

Proceedings of the International Symposium on Code Generation and Optimization
Static strands: Safely exposing dependence chains for increasing embedded power efficiency

ACM Transactions on Embedded Computing Systems (TECS) - Special Section LCTES'05
Extending an embedded RISC microprocessor for efficient translation based Java execution

Microprocessors & Microsystems
A cross-layer approach to heterogeneity and reliability

MEMOCODE'09 Proceedings of the 7th IEEE/ACM international conference on Formal Methods and Models for Codesign
DisIRer: Converting a retargetable compiler into a multiplatform binary translator

ACM Transactions on Architecture and Code Optimization (TACO)

Quantified Score

Hi-index	0.00

Visualization

Abstract

A dynamic binary translation system for a co-designed virtual machine is described and evaluated. The underlying hardware directly executes an accumulator-oriented instruction set that exposes instruction dependence chains (strands) to a distributed microarchitecture containing a simple instruction pipeline. To support conventional program binaries, a source instruction set (Alpha in our study) is dynamically translated to the target accumulator instruction set. The binary translator identifies chains of inter-instruction dependences and assigns them to dependence-carrying accumulators. Because the underlying superscalar microarchitecture is capable of dynamic instruction scheduling, the binary translation system does not perform aggressive optimizations or re-schedule code; this significantly reduces binary translation overhead.Detailed timing simulation of the dynamically translated code running on an accumulator-based distributed microarchitecture shows the overall system is capable of achieving similar performance to an ideal out-of-order superscalar processor, ignoring the significant clock frequency advantages that the accumulator-based hardware is likely to have. As part of the study, we evaluate an instruction set modification that simplifies precise trap implementation. This approach significantly reduces the number of instructions required for register state copying, thereby improving performance. We also observe that translation chaining methods can have substantial impact on the performance, and we evaluate a number of chaining methods.