Effective compiler support for predicated execution using the hyperblock
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Enhancing instruction scheduling with a block-structured ISA
International Journal of Parallel Programming
Exploiting instruction level parallelism in processors by caching scheduled groups
Proceedings of the 24th annual international symposium on Computer architecture
DAISY: dynamic compilation for 100% architectural compatibility
Proceedings of the 24th annual international symposium on Computer architecture
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Improving trace cache effectiveness with branch promotion and trace packing
Proceedings of the 25th annual international symposium on Computer architecture
Putting the fill unit to work: dynamic optimizations for trace cache microprocessors
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Increasing the instruction fetch rate via block-structured instruction set architectures
International Journal of Parallel Programming - Special issue: MICRO-29, 29th annual IEEE/ACM international symposium on microarchitecture
Simultaneous subordinate microthreading (SSMT)
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
A hardware mechanism for dynamic extraction and relayout of program hot spots
Proceedings of the 27th annual international symposium on Computer architecture
Proceedings of the 27th annual international symposium on Computer architecture
Increasing the size of atomic instruction blocks using control flow assertions
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
rePLay: A Hardware Framework for Dynamic Optimization
IEEE Transactions on Computers
Instruction Pre-Processing in Trace Processors
HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
Master/slave speculative parallelization
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Dynamic binary translation for accumulator-oriented architectures
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Improving quasi-dynamic schedules through region slip
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Selecting long atomic traces for high coverage
ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Dynamic Optimization of Micro-Operations
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Using Interaction Costs for Microarchitectural Bottleneck Analysis
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Hardware Support for Control Transfers in Code Caches
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Specialized Dynamic Optimizations for High-Performance Energy-Efficient Microarchitecture
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Power Awareness through Selective Dynamically Optimized Traces
Proceedings of the 31st annual international symposium on Computer architecture
Proceedings of the 31st annual international symposium on Computer architecture
Interaction cost and shotgun profiling
ACM Transactions on Architecture and Code Optimization (TACO)
Dynamic Strands: Collapsing Speculative Dependence Chains for Reducing Pipeline Communication
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Reactive Techniques for Controlling Software Speculation
Proceedings of the international symposium on Code generation and optimization
Proceedings of the 32nd annual international symposium on Computer Architecture
RENO: A Rename-Based Instruction Optimizer
Proceedings of the 32nd annual international symposium on Computer Architecture
Decomposing the load-store queue by function for power reduction and scalability
IBM Journal of Research and Development
Evaluating trace cache energy efficiency
ACM Transactions on Architecture and Code Optimization (TACO)
Hardware atomicity for reliable software speculation
Proceedings of the 34th annual international symposium on Computer architecture
Proceedings of the 6th ACM conference on Computing frontiers
Architecture Design for Soft Errors
Architecture Design for Soft Errors
HiPC'08 Proceedings of the 15th international conference on High performance computing
PARROT: power awareness through selective dynamically optimized traces
PACS'03 Proceedings of the Third international conference on Power - Aware Computer Systems
Discerning the dominant out-of-order performance advantage: is it speculation or dynamism?
Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
IVF: characterizing the vulnerability of microprocessor structures to intermittent faults
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Speculative hardware/software co-designed floating-point multiply-add fusion
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Hi-index | 0.00 |
We evaluate the rePLay microarchitecture as a means for reducing application execution time by facilitating dynamic optimization. The framework contains a programmable optimization engine coupled with a hardware-based recovery mechanism. The optimization engine enables the dynamic optimizer to run concurrently with program execution. The recovery mechanism enables the optimizer to make speculative optimizations without requiring recovery code.We demonstrate that a rePLay configuration performing a small suite of simple optimizations on Alpha code attains an average of 13% reduction in execution cycles on the SPEC2000 integer benchmarks over a rePLay configuration not performing optimizations, and a 21% reduction over an aggressive standard superscalar microarchitecture.