The superblock: an effective technique for VLIW and superscalar compilation
The Journal of Supercomputing - Special issue on instruction-level parallelism
An out-of-order execution technique for runtime binary translators
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Proceedings of the 14th international conference on Supercomputing
Early load address resolution via register tracking
Proceedings of the 27th annual international symposium on Computer architecture
Dynamo: a transparent dynamic optimization system
PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
On pipelining dynamic instruction scheduling logic
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Reducing the complexity of the issue logic
ICS '01 Proceedings of the 15th international conference on Supercomputing
Dynamic Binary Translation and Optimization
IEEE Transactions on Computers
Efficient dynamic scheduling through tag elimination
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
An instruction set and microarchitecture for instruction level distributed processing
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Select-free instruction scheduling logic
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
IEEE Transactions on Computers
Dynamic binary translation for accumulator-oriented architectures
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Proceedings of the 30th annual international symposium on Computer architecture
Data-Flow Prescheduling for Large Instruction Windows in Out-of-Order Processors
HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Macro-op Scheduling: Relaxing Scheduling Loop Constraints
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
POWER4 system microarchitecture
IBM Journal of Research and Development
Dataflow Mini-Graphs: Amplifying Superscalar Capacity and Bandwidth
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Dynamic coalescing for 16-bit instructions
ACM Transactions on Embedded Computing Systems (TECS)
RENO: A Rename-Based Instruction Optimizer
Proceedings of the 32nd annual international symposium on Computer Architecture
Exploring the design space of LUT-based transparent accelerators
Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems
Reducing Startup Time in Co-Designed Virtual Machines
Proceedings of the 33rd annual international symposium on Computer Architecture
Scalable subgraph mapping for acyclic computation accelerators
CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
Serialization-Aware Mini-Graphs: Performance with Fewer Resources
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Exploiting Narrow Accelerators with Data-Centric Subgraph Mapping
Proceedings of the International Symposium on Code Generation and Optimization
Proceedings of the 2008 ACM symposium on Applied computing
DVFS in loop accelerators using BLADES
Proceedings of the 45th annual Design Automation Conference
Recurrence-aware instruction set selection for extensible embedded processors
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Analysis of x86 ISA condition codes influence on superscalar execution
HiPC'07 Proceedings of the 14th international conference on High performance computing
HiPC'08 Proceedings of the 15th international conference on High performance computing
Architecture Optimization of Application-Specific Implicit Instructions
ACM Transactions on Embedded Computing Systems (TECS) - Special Section on CAPA'09, Special Section on WHS'09, and Special Section VCPSS' 09
Hi-index | 0.00 |
Instruction scheduling hardware can be simplifiedand easily pipelined if pairs of dependent instructionsare fused so they share a single instruction schedulingslot. We study an implementation of the x86 ISA thatdynamically translates x86 code to an underlying ISAthat supports instruction fusing. A microarchitecturethat is co-designed with the fused instruction set completesthe implementation.In this paper, we focus on the dynamic binarytranslator for such a co-designed x86 virtual machine.The dynamic binary translator first cracks x86 instructionsbelonging to hot superblocks into RISC-stylemicro-operations, and then uses heuristics to fuse togetherpairs of dependent micro-operations.Experimental results with SPEC2000 integer benchmarksdemonstrate that: (1) the fused ISA with dynamicbinary translation reduces the number of schedulingdecisions by about 30% versus a conventionalimplementation that uses hardware cracking into RISCmicro-operations; (2) an instruction scheduling slotneeds only hold two source register fields even thoughit may hold two instructions; (3) translations generatedin the proposed ISA consume about 30% less storagethan a corresponding fixed-length RISC-style ISA.