RENO: A Rename-Based Instruction Optimizer

Authors:
Vlad Petric;Tingting Sha;Amir Roth
Affiliations:
-;-;-
Venue:
Proceedings of the 32nd annual international symposium on Computer Architecture
Year:
2005

Citing 23
Cited 8

Dynamic instruction reuse

Proceedings of the 24th annual international symposium on Computer architecture
Streamlining inter-operation memory communication via data dependence prediction

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Memory dependence prediction using store sets

Proceedings of the 25th annual international symposium on Computer architecture
A novel renaming scheme to exploit value temporal locality through physical register reuse and unification

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Early load address resolution via register tracking

Proceedings of the 27th annual international symposium on Computer architecture
Register integration: a simple and efficient implementation of squash reuse

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Load and store reuse using register file contents

ICS '01 Proceedings of the 15th international conference on Supercomputing
Focusing processor policies via critical-path prediction

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Implementing optimizations at decode time

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Performance characterization of a hardware mechanism for dynamic optimization

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Dynamic dead-instruction detection and elimination

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Interlock Collapsing ALU's

IEEE Transactions on Computers
Cherry: checkpointed early resource recycling in out-of-order microprocessors

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Three extensions to register integration

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Loose Loops Sink Chips

HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Using Interaction Costs for Microarchitectural Bottleneck Analysis

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Exploiting Value Locality in Physical Register Files

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Macro-op Scheduling: Relaxing Scheduling Loop Constraints

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Using Dynamic Binary Translation to Fuse Dependent Instructions

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Dataflow Mini-Graphs: Amplifying Superscalar Capacity and Bandwidth

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Store Vulnerability Window (SVW): Re-Execution Filtering for Enhanced Load Optimization

Proceedings of the 32nd annual international symposium on Computer Architecture
Dynamically reducing pressure on the physical register file through simple register sharing

ISPASS '04 Proceedings of the 2004 IEEE International Symposium on Performance Analysis of Systems and Software

Self-checking instructions: reducing instruction redundancy for concurrent error detection

Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Branch predictor guided instruction decoding

Proceedings of the 15th international conference on Parallel architectures and compilation techniques
SPARTAN: speculative avoidance of register allocations to transient values for performance and energy efficiency

Proceedings of the 15th international conference on Parallel architectures and compilation techniques
NoSQ: Store-Load Communication without a Store Queue

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
NoSQ: Store-Load Communication without a Store Queue

IEEE Micro
Predicting and Exploiting Transient Values for Reducing Register File Pressure and Energy Consumption

IEEE Transactions on Computers
RETCON: transactional repair without replay

Proceedings of the 37th annual international symposium on Computer architecture
Watchdog: hardware for safe and secure manual memory management and full memory safety

Proceedings of the 39th Annual International Symposium on Computer Architecture

Quantified Score

Hi-index	0.00

Visualization

Abstract

RENO is a modified MIPS R10000 register renamer that uses map-table "short-circuiting" to implement dynamic versions of several well-known static optimizations: move elimination, common subexpression elimination, register allocation, and constant folding. Because it implements these optimizations dynamically, RENO can apply optimizations in certain situations where static compilers cannot. Several of RENOýs component optimizations have been previously proposed as independent mechanisms. Unified renaming [13] implements dynamic move elimination and speculative memory bypassing [19] (the dynamic counterpart of register allocation). Register integration [21] implements common-subexpression elimination and speculative memory bypassing. RENO unifies these mechanisms and adds a dynamic version of constant folding, RENOCF. RENOCF uses an extended map table format and a limited form of dynamic operation fusion. Cycle-level simulation shows that RENO dynamically eliminates (i.e., optimizes away) 22% of the dynamic instructions in both SPECint2000 and MediaBench. RENOCF is responsible for 12% and 17% of the eliminations, respectively. Because dataflow dependences are collapsed around eliminated instructions, performance improves by 8% and 13%, respectively. Alternatively, because eliminated instructions do not consume issue queue entries, physical registers, or issue, bypass, register file, and execution bandwidth, RENO can be used to absorb the performance impact of a significantly scaled-down execution core.