Use-Based Register Caching with Decoupled Indexing

Authors:
J. Adam Butts;Gurindar S. Sohi
Affiliations:
University of Wisconsin-Madison;University of Wisconsin-Madison
Venue:
Proceedings of the 31st annual international symposium on Computer architecture
Year:
2004

Citing 15
Cited 16

Hierarchical registers for scientific computers

ICS '88 Proceedings of the 2nd international conference on Supercomputing
The performance impact of incomplete bypassing in processor pipelines

Proceedings of the 28th annual international symposium on Microarchitecture
Exploiting short-lived variables in superscalar processors

Proceedings of the 28th annual international symposium on Microarchitecture
Speculation techniques for improving load related instruction scheduling

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Multiple-banked register file architectures

Proceedings of the 27th annual international symposium on Computer architecture
Two-level hierarchical register file organization for VLIW processors

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
The optimum pipeline depth for a microprocessor

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
The optimal logic depth per pipeline stage is 6 to 8 FO4 inverter delays

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Increasing processor performance by implementing deeper pipelines

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Reducing the complexity of the register file in dynamic superscalar processors

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
The Alpha 21264 Microprocessor

IEEE Micro
Caching processor general registers

ICCD '95 Proceedings of the 1995 International Conference on Computer Design: VLSI in Computers and Processors
Characterizing and predicting value degree of use

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Virtual-Physical Registers

HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
Loose Loops Sink Chips

HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture

Register Packing: Exploiting Narrow-Width Operands for Reducing Register File Pressure

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Speculative early register release

Proceedings of the 3rd conference on Computing frontiers
SPARTAN: speculative avoidance of register allocations to transient values for performance and energy efficiency

Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Early Register Deallocation Mechanisms Using Checkpointed Register Files

IEEE Transactions on Computers
Register file caching for energy efficiency

Proceedings of the 2006 international symposium on Low power electronics and design
Register port complexity reduction in wide-issue processors with selective instruction execution

Microprocessors & Microsystems
Reconciling performance and programmability in networking systems

Proceedings of the 2007 conference on Applications, technologies, architectures, and protocols for computer communications
Predicting and Exploiting Transient Values for Reducing Register File Pressure and Energy Consumption

IEEE Transactions on Computers
Asymmetrically banked value-aware register files for low-energy and high-performance

Microprocessors & Microsystems
Achieving Out-of-Order Performance with Almost In-Order Complexity

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Exploring the limits of early register release: Exploiting compiler analysis

ACM Transactions on Architecture and Code Optimization (TACO)
Energy-efficient register caching with compiler assistance

ACM Transactions on Architecture and Code Optimization (TACO)
Exploiting narrow-width values for thermal-aware register file designs

Proceedings of the Conference on Design, Automation and Test in Europe
Register Cache System Not for Latency Reduction Purpose

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
On the exploitation of narrow-width values for improving register file reliability

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Exploiting narrow values for energy efficiency in the register files of superscalar microprocessors

PATMOS'06 Proceedings of the 16th international conference on Integrated Circuit and System Design: power and Timing Modeling, Optimization and Simulation

Quantified Score

Hi-index	0.01

Visualization

Abstract

Wide, deep pipelines need many physical registersto hold the results of in-flight instructions. Simultaneously,high clock frequencies prohibit using largeregister files and bypass networks without a significantperformance penalty. Previously proposed techniquesusing register caching to reduce this penalty sufferfrom several problems including poor insertion andreplacement decisions and the need for a fully-associativecache for good performance. We present novelmechanisms for managing and indexing register cachesthat address these problems using knowledge of thenumber of consumers of each register value.The insertion policy reduces pollution by not cachinga register value when all of its predicted consumersare satisfied by the bypass network. The replacementpolicy selects register cache entries with the fewestremaining uses (often zero), lowering the miss rate. Wealso introduce a new, general method of mapping physicalregisters to register cache sets that improves theperformance of set-associative cache organizations byreducing conflicts. Our results indicate that a 64-entry,two-way set associative cache using these techniquesoutperforms multi-cycle monolithic register files andpreviously proposed hierarchical register files.