Improving the accuracy of static branch prediction using branch correlation
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Performance characterization of a hardware mechanism for dynamic optimization
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Vacuum packing: extracting hardware-detected program phases for post-link optimization
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Dynamic trace selection using performance monitoring hardware sampling
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Improving quasi-dynamic schedules through region slip
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Dynamic profiling and trace cache generation
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Selecting long atomic traces for high coverage
ICS '03 Proceedings of the 17th annual international conference on Supercomputing
LLVA: A Low-level Virtual Instruction Set Architecture
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Specialized Dynamic Optimizations for High-Performance Energy-Efficient Microarchitecture
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
VHC: Quickly Building an Optimizer for Complex Embedded Architectures
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Power Awareness through Selective Dynamically Optimized Traces
Proceedings of the 31st annual international symposium on Computer architecture
Proceedings of the 31st annual international symposium on Computer architecture
Dynamic Strands: Collapsing Speculative Dependence Chains for Reducing Pipeline Communication
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Dataflow Mini-Graphs: Amplifying Superscalar Capacity and Bandwidth
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
A Programmable Hardware Path Profiler
Proceedings of the international symposium on Code generation and optimization
An Architecture Framework for Transparent Instruction Set Customization in Embedded Processors
Proceedings of the 32nd annual international symposium on Computer Architecture
Exploring the design space of LUT-based transparent accelerators
Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems
An Event-Driven Multithreaded Dynamic Optimization Framework
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Continuous Path and Edge Profiling
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Reducing Startup Time in Co-Designed Virtual Machines
Proceedings of the 33rd annual international symposium on Computer Architecture
ACM Transactions on Computer Systems (TOCS)
Hardware atomicity for reliable software speculation
Proceedings of the 34th annual international symposium on Computer architecture
Speculative optimization using hardware-monitored guarded regions for java virtual machines
Proceedings of the 3rd international conference on Virtual execution environments
Proceedings of the conference on Design, automation and test in Europe
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
An architecture framework for an adaptive extensible processor
The Journal of Supercomputing
Transparent reconfigurable acceleration for heterogeneous embedded applications
Proceedings of the conference on Design, automation and test in Europe
IEICE - Transactions on Information and Systems
Dynamically Adapted Low Power ASIPs
ARC '09 Proceedings of the 5th International Workshop on Reconfigurable Computing: Architectures, Tools and Applications
A real system evaluation of hardware atomicity for software speculation
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
TAO: two-level atomicity for dynamic binary optimizations
Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
Reusing cached schedules in an out-of-order processor with in-order issue logic
ICCD'09 Proceedings of the 2009 IEEE international conference on Computer design
International Journal of Reconfigurable Computing - Special issue on selected papers from the 17th reconfigurable architectures workshop (RAW2010)
SoftHV: a HW/SW co-designed processor with horizontal and vertical fusion
Proceedings of the 8th ACM International Conference on Computing Frontiers
Trace-Based runtime instruction rescheduling for architecture extension
ICESS'05 Proceedings of the Second international conference on Embedded Software and Systems
DDGacc: boosting dynamic DDG-based binary optimizations through specialized hardware support
VEE '12 Proceedings of the 8th ACM SIGPLAN/SIGOPS conference on Virtual Execution Environments
Bundled execution of recurring traces for energy-efficient general purpose processing
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
PARROT: power awareness through selective dynamically optimized traces
PACS'03 Proceedings of the Third international conference on Power - Aware Computer Systems
LAR-CC: Large atomic regions with conditional commits
CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
The Journal of Supercomputing
BlockChop: dynamic squash elimination for hybrid processor architecture
Proceedings of the 39th Annual International Symposium on Computer Architecture
Towards a multiple-ISA embedded system
Journal of Systems Architecture: the EUROMICRO Journal
DeAliaser: alias speculation using atomic region support
Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
Discerning the dominant out-of-order performance advantage: is it speculation or dynamism?
Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
TSO_ATOMICITY: efficient hardware primitive for TSO-preserving region optimizations
Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
Robust architectural support for transactional memory in the power architecture
Proceedings of the 40th Annual International Symposium on Computer Architecture
Trace based phase prediction for tightly-coupled heterogeneous cores
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Hi-index | 14.98 |
In this paper, we propose a new processor framework that supports dynamic optimization. The rePLay Framework embeds an optimization engine atop a high-performance execution engine. The heart of the rePLay Framework is the concept of a frame. Frames are large, single-entry, single-exit optimization regions spanning many basic blocks in the program's dynamic instruction stream, yet containing only a single flow of control. This atomic property of frames increases the flexibilty in applying optimizations. To support frames, rePLay includes a hardware-based recovery mechanism that rolls back the architectural state to the beginning of a frame if, for example, an early exit condition is detected. This mechanism permits the optimizer to make speculative, aggressive optimizations upon frames. In this paper, we investigate some of the underlying phenomenon that support rePLay. Primarily, we evaluate rePLay's region formation strategy. A rePLay configuration with a 256-entry frame cache, using 74KB frame constructor and frame sequencer, achieves an average frame size of 88 Alpha AXP instructions with 68 percent coverage of the dynamic istream, an average frame completion rate of 97.81 percent, and a frame predictor accuracy of 81.26 percent. These results soundly demonstrate that the frames upon which the optimizations are performed are large and stable. Using the most frequently initiated frames from rePLay executions as samples, we also highlight possible strategies for the rePLay optimization engine. Coupled with the high coverage of frames achieved through the dynamic frame construction, the success of these optimizations demonstrates the significance of the rePLay Framework. We believe that the concept of frames, along with the mechanisms and strategies outlined in this paper, will play an important role in future processor architecture.