Hierarchical registers for scientific computers
ICS '88 Proceedings of the 2nd international conference on Supercomputing
The performance impact of incomplete bypassing in processor pipelines
Proceedings of the 28th annual international symposium on Microarchitecture
Exploiting short-lived variables in superscalar processors
Proceedings of the 28th annual international symposium on Microarchitecture
Speculation techniques for improving load related instruction scheduling
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Multiple-banked register file architectures
Proceedings of the 27th annual international symposium on Computer architecture
Two-level hierarchical register file organization for VLIW processors
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
The optimum pipeline depth for a microprocessor
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
The optimal logic depth per pipeline stage is 6 to 8 FO4 inverter delays
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Increasing processor performance by implementing deeper pipelines
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Reducing the complexity of the register file in dynamic superscalar processors
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
The Alpha 21264 Microprocessor
IEEE Micro
Caching processor general registers
ICCD '95 Proceedings of the 1995 International Conference on Computer Design: VLSI in Computers and Processors
Characterizing and predicting value degree of use
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Register Packing: Exploiting Narrow-Width Operands for Reducing Register File Pressure
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Speculative early register release
Proceedings of the 3rd conference on Computing frontiers
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Early Register Deallocation Mechanisms Using Checkpointed Register Files
IEEE Transactions on Computers
Register file caching for energy efficiency
Proceedings of the 2006 international symposium on Low power electronics and design
Register port complexity reduction in wide-issue processors with selective instruction execution
Microprocessors & Microsystems
Reconciling performance and programmability in networking systems
Proceedings of the 2007 conference on Applications, technologies, architectures, and protocols for computer communications
IEEE Transactions on Computers
Asymmetrically banked value-aware register files for low-energy and high-performance
Microprocessors & Microsystems
Achieving Out-of-Order Performance with Almost In-Order Complexity
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Exploring the limits of early register release: Exploiting compiler analysis
ACM Transactions on Architecture and Code Optimization (TACO)
Energy-efficient register caching with compiler assistance
ACM Transactions on Architecture and Code Optimization (TACO)
Exploiting narrow-width values for thermal-aware register file designs
Proceedings of the Conference on Design, Automation and Test in Europe
Register Cache System Not for Latency Reduction Purpose
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
On the exploitation of narrow-width values for improving register file reliability
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Exploiting narrow values for energy efficiency in the register files of superscalar microprocessors
PATMOS'06 Proceedings of the 16th international conference on Integrated Circuit and System Design: power and Timing Modeling, Optimization and Simulation
Hi-index | 0.01 |
Wide, deep pipelines need many physical registersto hold the results of in-flight instructions. Simultaneously,high clock frequencies prohibit using largeregister files and bypass networks without a significantperformance penalty. Previously proposed techniquesusing register caching to reduce this penalty sufferfrom several problems including poor insertion andreplacement decisions and the need for a fully-associativecache for good performance. We present novelmechanisms for managing and indexing register cachesthat address these problems using knowledge of thenumber of consumers of each register value.The insertion policy reduces pollution by not cachinga register value when all of its predicted consumersare satisfied by the bypass network. The replacementpolicy selects register cache entries with the fewestremaining uses (often zero), lowering the miss rate. Wealso introduce a new, general method of mapping physicalregisters to register cache sets that improves theperformance of set-associative cache organizations byreducing conflicts. Our results indicate that a 64-entry,two-way set associative cache using these techniquesoutperforms multi-cycle monolithic register files andpreviously proposed hierarchical register files.