Exploiting Value Locality in Physical Register Files

Authors:
Saisanthosh Balakrishnan;Gurindar S. Sohi
Affiliations:
Computer Sciences Department, University of Wisconsin-Madison;Computer Sciences Department, University of Wisconsin-Madison
Venue:
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Year:
2003

Citing 20
Cited 20

A VLIW architecture for a trace scheduling compiler

ASPLOS II Proceedings of the second international conference on Architectual support for programming languages and operating systems
Hierarchical registers for scientific computers

ICS '88 Proceedings of the 2nd international conference on Supercomputing
The anatomy of the register file in a multiscalar processor

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Exceeding the dataflow limit via value prediction

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Complexity-effective superscalar processors

Proceedings of the 24th annual international symposium on Computer architecture
The multicluster architecture: reducing cycle time through partitioning

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
A novel renaming scheme to exploit value temporal locality through physical register reuse and unification

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
An empirical analysis of instruction repetition

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Delaying physical register allocation through virtual-physical registers

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Multiple-banked register file architectures

Proceedings of the 27th annual international symposium on Computer architecture
The CRAY-1 computer system

Communications of the ACM - Special issue on computer architecture
Frequent value compression in data caches

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Integrating superscalar processor components to implement register caching

ICS '01 Proceedings of the 15th international conference on Supercomputing
Reducing the complexity of the register file in dynamic superscalar processors

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
The Alpha 21264 Microprocessor

IEEE Micro
Caching processor general registers

ICCD '95 Proceedings of the 1995 International Conference on Computer Design: VLSI in Computers and Processors
Register File Design Considerations in Dynamically Scheduled Processors

HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
Virtual-Physical Registers

HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
Detecting global stride locality in value streams

Proceedings of the 30th annual international symposium on Computer architecture
A Scalable Register File Architecture for Dynamically Scheduled Processors

PACT '96 Proceedings of the 1996 Conference on Parallel Architectures and Compilation Techniques

Register Packing: Exploiting Narrow-Width Operands for Reducing Register File Pressure

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
RENO: A Rename-Based Instruction Optimizer

Proceedings of the 32nd annual international symposium on Computer Architecture
SPARTAN: speculative avoidance of register allocations to transient values for performance and energy efficiency

Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Early Register Deallocation Mechanisms Using Checkpointed Register Files

IEEE Transactions on Computers
Selective writeback: exploiting transient values for energy-efficiency and performance

Proceedings of the 2006 international symposium on Low power electronics and design
Register file caching for energy efficiency

Proceedings of the 2006 international symposium on Low power electronics and design
Register port complexity reduction in wide-issue processors with selective instruction execution

Microprocessors & Microsystems
Compacting register file via 2-level renaming and bit-partitioning

Microprocessors & Microsystems
An L2-miss-driven early register deallocation for SMT processors

Proceedings of the 21st annual international conference on Supercomputing
Predicting and Exploiting Transient Values for Reducing Register File Pressure and Energy Consumption

IEEE Transactions on Computers
Asymmetrically banked value-aware register files for low-energy and high-performance

Microprocessors & Microsystems
Early detection and bypassing of trivial operations to improve energy efficiency of processors

Microprocessors & Microsystems
Reducing register pressure in SMT processors through L2-miss-driven early register release

ACM Transactions on Architecture and Code Optimization (TACO)
Selective writeback: reducing register file pressure and energy consumption

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
A Flexible Code Compression Scheme Using Partitioned Look-Up Tables

HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
Energy-efficient renaming with register versioning

Proceedings of the 14th ACM/IEEE international symposium on Low power electronics and design
Exploiting narrow-width values for thermal-aware register file designs

Proceedings of the Conference on Design, Automation and Test in Europe
A unified approach to eliminate memory accesses early

CASES '11 Proceedings of the 14th international conference on Compilers, architectures and synthesis for embedded systems
An optimized front-end physical register file with banking and writeback filtering

PACS'04 Proceedings of the 4th international conference on Power-Aware Computer Systems
Base-delta-immediate compression: practical data compression for on-chip caches

Proceedings of the 21st international conference on Parallel architectures and compilation techniques

Quantified Score

Hi-index	0.01

Visualization

Abstract

The physical register file is an important component of adynamically-scheduled processor. Increasing the amount of parallelismplaces increasing demands on the physical register file,calling for alternative file organization and management strategies.This paper considers the use of value locality to optimize theoperation of physical register files.We present empirical data showing that: (i) the value producedby an instruction is often the same as a value produced by anotherrecently executed instruction, resulting in multiple physical registerscontaining the same value, and (ii) the values 0 and 1 accountfor a considerable fraction of the values written to and read fromphysical registers. The paper then presents three schemes to exploitthe above observations.The first scheme extends a previously-proposed scheme to useonly a single physical register for each unique value. The secondscheme is a special case for the values 0 and 1. By restricting optimizationto these values, the second scheme eliminates many of thedrawbacks of the first scheme. The third scheme further improveson the second, resulting in an optimization that reduces physicalregister requirements with simple micro-architectural extensions.A performance evaluation of the three schemes is also presented.