rePLay: A Hardware Framework for Dynamic Optimization

Authors:
Sanjay J. Patel;Steven S. Lumetta
Affiliations:
Univ. of Illinois, Urbana;Univ. of Illinois, Urbana
Venue:
IEEE Transactions on Computers
Year:
2001

Citing 1
Cited 49

Improving the accuracy of static branch prediction using branch correlation

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems

Performance characterization of a hardware mechanism for dynamic optimization

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Vacuum packing: extracting hardware-detected program phases for post-link optimization

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Dynamic trace selection using performance monitoring hardware sampling

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Improving quasi-dynamic schedules through region slip

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Dynamic profiling and trace cache generation

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Selecting long atomic traces for high coverage

ICS '03 Proceedings of the 17th annual international conference on Supercomputing
LLVA: A Low-level Virtual Instruction Set Architecture

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Specialized Dynamic Optimizations for High-Performance Energy-Efficient Microarchitecture

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
VHC: Quickly Building an Optimizer for Complex Embedded Architectures

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Power Awareness through Selective Dynamically Optimized Traces

Proceedings of the 31st annual international symposium on Computer architecture
From Sequences of Dependent Instructions to Functions: An Approach for Improving Performance without ILP or Speculation

Proceedings of the 31st annual international symposium on Computer architecture
Dynamic Strands: Collapsing Speculative Dependence Chains for Reducing Pipeline Communication

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Dataflow Mini-Graphs: Amplifying Superscalar Capacity and Bandwidth

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Application-Specific Processing on a General-Purpose Core via Transparent Instruction Set Customization

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
A Programmable Hardware Path Profiler

Proceedings of the international symposium on Code generation and optimization
An Architecture Framework for Transparent Instruction Set Customization in Embedded Processors

Proceedings of the 32nd annual international symposium on Computer Architecture
Exploring the design space of LUT-based transparent accelerators

Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems
An Event-Driven Multithreaded Dynamic Optimization Framework

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Trace Cache Sampling Filter

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Continuous Path and Edge Profiling

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Reducing Startup Time in Co-Designed Virtual Machines

Proceedings of the 33rd annual international symposium on Computer Architecture
Trace cache sampling filter

ACM Transactions on Computer Systems (TOCS)
Hardware atomicity for reliable software speculation

Proceedings of the 34th annual international symposium on Computer architecture
Speculative optimization using hardware-monitored guarded regions for java virtual machines

Proceedings of the 3rd international conference on Virtual execution environments
Interactive presentation: Generating and executing multi-exit custom instructions for an adaptive extensible processor

Proceedings of the conference on Design, automation and test in Europe
Predictor virtualization

Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
An architecture framework for an adaptive extensible processor

The Journal of Supercomputing
Transparent reconfigurable acceleration for heterogeneous embedded applications

Proceedings of the conference on Design, automation and test in Europe
Improving Performance and Energy Saving in a Reconfigurable Processor via Accelerating Control Data Flow Graphs

IEICE - Transactions on Information and Systems
Dynamically Adapted Low Power ASIPs

ARC '09 Proceedings of the 5th International Workshop on Reconfigurable Computing: Architectures, Tools and Applications
A real system evaluation of hardware atomicity for software speculation

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
TAO: two-level atomicity for dynamic binary optimizations

Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
Reusing cached schedules in an out-of-order processor with in-order issue logic

ICCD'09 Proceedings of the 2009 IEEE international conference on Computer design
Boosting parallel applications performance on applying DIM technique in a multiprocessing environment

International Journal of Reconfigurable Computing - Special issue on selected papers from the 17th reconfigurable architectures workshop (RAW2010)
SoftHV: a HW/SW co-designed processor with horizontal and vertical fusion

Proceedings of the 8th ACM International Conference on Computing Frontiers
Trace-Based runtime instruction rescheduling for architecture extension

ICESS'05 Proceedings of the Second international conference on Embedded Software and Systems
DDGacc: boosting dynamic DDG-based binary optimizations through specialized hardware support

VEE '12 Proceedings of the 8th ACM SIGPLAN/SIGOPS conference on Virtual Execution Environments
Bundled execution of recurring traces for energy-efficient general purpose processing

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
PARROT: power awareness through selective dynamically optimized traces

PACS'03 Proceedings of the Third international conference on Power - Aware Computer Systems
LAR-CC: Large atomic regions with conditional commits

CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
A HW/SW co-designed heterogeneous multi-core virtual machine for energy-efficient general purpose computing

CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
Improving performance and energy efficiency of embedded processors via post-fabrication instruction set customization

The Journal of Supercomputing
BlockChop: dynamic squash elimination for hybrid processor architecture

Proceedings of the 39th Annual International Symposium on Computer Architecture
Towards a multiple-ISA embedded system

Journal of Systems Architecture: the EUROMICRO Journal
DeAliaser: alias speculation using atomic region support

Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
Discerning the dominant out-of-order performance advantage: is it speculation or dynamism?

Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
TSO_ATOMICITY: efficient hardware primitive for TSO-preserving region optimizations

Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
Robust architectural support for transactional memory in the power architecture

Proceedings of the 40th Annual International Symposium on Computer Architecture
Trace based phase prediction for tightly-coupled heterogeneous cores

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture

Quantified Score

Hi-index	14.98

Visualization

Abstract

In this paper, we propose a new processor framework that supports dynamic optimization. The rePLay Framework embeds an optimization engine atop a high-performance execution engine. The heart of the rePLay Framework is the concept of a frame. Frames are large, single-entry, single-exit optimization regions spanning many basic blocks in the program's dynamic instruction stream, yet containing only a single flow of control. This atomic property of frames increases the flexibilty in applying optimizations. To support frames, rePLay includes a hardware-based recovery mechanism that rolls back the architectural state to the beginning of a frame if, for example, an early exit condition is detected. This mechanism permits the optimizer to make speculative, aggressive optimizations upon frames. In this paper, we investigate some of the underlying phenomenon that support rePLay. Primarily, we evaluate rePLay's region formation strategy. A rePLay configuration with a 256-entry frame cache, using 74KB frame constructor and frame sequencer, achieves an average frame size of 88 Alpha AXP instructions with 68 percent coverage of the dynamic istream, an average frame completion rate of 97.81 percent, and a frame predictor accuracy of 81.26 percent. These results soundly demonstrate that the frames upon which the optimizations are performed are large and stable. Using the most frequently initiated frames from rePLay executions as samples, we also highlight possible strategies for the rePLay optimization engine. Coupled with the high coverage of frames achieved through the dynamic frame construction, the success of these optimizations demonstrates the significance of the rePLay Framework. We believe that the concept of frames, along with the mechanisms and strategies outlined in this paper, will play an important role in future processor architecture.