Physical Register Inlining

Authors:
Mikko H. Lipasti;Brian R. Mestan;Erika Gunadi
Affiliations:
University of Wisconsin-Madison;IBM Corporation - Austin, TX;University of Wisconsin-Madison
Venue:
Proceedings of the 31st annual international symposium on Computer architecture
Year:
2004

Citing 25
Cited 26

Hierarchical registers for scientific computers

ICS '88 Proceedings of the 2nd international conference on Supercomputing
Exploiting short-lived variables in superscalar processors

Proceedings of the 28th annual international symposium on Microarchitecture
Register renaming and dynamic speculation: an alternative approach

MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
Complexity-effective superscalar processors

Proceedings of the 24th annual international symposium on Computer architecture
Exploiting dead value information

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
A novel renaming scheme to exploit value temporal locality through physical register reuse and unification

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Delaying physical register allocation through virtual-physical registers

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Software-Directed Register Deallocation for Simultaneous Multithreaded Processors

IEEE Transactions on Parallel and Distributed Systems
Multiple-banked register file architectures

Proceedings of the 27th annual international symposium on Computer architecture
Two-level hierarchical register file organization for VLIW processors

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Very low power pipelines using significance compression

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Integrating superscalar processor components to implement register caching

ICS '01 Proceedings of the 15th international conference on Supercomputing
The optimal logic depth per pipeline stage is 6 to 8 FO4 inverter delays

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Increasing processor performance by implementing deeper pipelines

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
A large, fast instruction window for tolerating cache misses

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Implementing optimizations at decode time

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
The MIPS R10000 Superscalar Microprocessor

IEEE Micro
Cherry: checkpointed early resource recycling in out-of-order microprocessors

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Register File Design Considerations in Dynamically Scheduled Processors

HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
Virtual-Physical Registers

HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
Dynamically Exploiting Narrow Width Operands to Improve Processor Power and Performance

HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
Loose Loops Sink Chips

HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
A Scalable Register File Architecture for Dynamically Scheduled Processors

PACT '96 Proceedings of the 1996 Conference on Parallel Architectures and Compilation Techniques
Simultaneous multithreading

Simultaneous multithreading
Checkpoint Processing and Recovery: Towards Scalable Large Instruction Window Processors

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture

Register Packing: Exploiting Narrow-Width Operands for Reducing Register File Pressure

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Continuous Optimization

Proceedings of the 32nd annual international symposium on Computer Architecture
An asymmetric clustered processor based on value content

Proceedings of the 19th annual international conference on Supercomputing
Compiler Directed Early Register Release

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Exploiting data-dependent slack using dynamic multi-VDD to minimize energy consumption in datapath circuits

Proceedings of the conference on Design, automation and test in Europe: Proceedings
Design space exploration for 3D architectures

ACM Journal on Emerging Technologies in Computing Systems (JETC)
SPARTAN: speculative avoidance of register allocations to transient values for performance and energy efficiency

Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Early Register Deallocation Mechanisms Using Checkpointed Register Files

IEEE Transactions on Computers
A case for a complexity-effective, width-partitioned microarchitecture

ACM Transactions on Architecture and Code Optimization (TACO)
Selective writeback: exploiting transient values for energy-efficiency and performance

Proceedings of the 2006 international symposium on Low power electronics and design
Register port complexity reduction in wide-issue processors with selective instruction execution

Microprocessors & Microsystems
Compacting register file via 2-level renaming and bit-partitioning

Microprocessors & Microsystems
An L2-miss-driven early register deallocation for SMT processors

Proceedings of the 21st annual international conference on Supercomputing
Predicting and Exploiting Transient Values for Reducing Register File Pressure and Energy Consumption

IEEE Transactions on Computers
Asymmetrically banked value-aware register files for low-energy and high-performance

Microprocessors & Microsystems
Reducing register pressure in SMT processors through L2-miss-driven early register release

ACM Transactions on Architecture and Code Optimization (TACO)
Selective writeback: reducing register file pressure and energy consumption

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Exploring the limits of early register release: Exploiting compiler analysis

ACM Transactions on Architecture and Code Optimization (TACO)
Exploiting narrow-width values for thermal-aware register file designs

Proceedings of the Conference on Design, Automation and Test in Europe
Empowering a helper cluster through data-width aware instruction selection policies

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
On the exploitation of narrow-width values for improving register file reliability

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
CRIB: consolidated rename, issue, and bypass

Proceedings of the 38th annual international symposium on Computer architecture
2L-MuRR: a compact register renaming scheme for SMT processors

ISPA'05 Proceedings of the Third international conference on Parallel and Distributed Processing and Applications
An optimized front-end physical register file with banking and writeback filtering

PACS'04 Proceedings of the 4th international conference on Power-Aware Computer Systems
Exploiting narrow values for energy efficiency in the register files of superscalar microprocessors

PATMOS'06 Proceedings of the 16th international conference on Integrated Circuit and System Design: power and Timing Modeling, Optimization and Simulation
Improved bitwidth-aware variable packing

ACM Transactions on Architecture and Code Optimization (TACO)

Quantified Score

Hi-index	0.01

Visualization

Abstract

Physical register access time increases the delaybetween scheduling and execution in modern out-of-orderprocessors. As the number of physical registers increases,this delay grows, forcing designers to employ register fileswith multicycle access. This paper advocates more efficientutilization of a fewer number of physical registers in orderto reduce the access time of the physical register file. Registervalues with few significant bits are stored in the renamemap using physical register inlining, a scheme analogous toinlining of operand fields in data structures. Specifically,whenever a register value can be expressed with fewer bitsthan the register map would need to specify a physical registernumber, the value is stored directly in the map, avoidingthe indirection, and saving space in the physical register file.Not surprisingly, we find that a significant portion of all registeroperands can be stored in the map in this fashion, anddescribe straightforward microarchitectural extensions thatcorrectly implement physical register inlining. We find thatphysical register inlining performs well, particularly in processorsthat are register-constrained.