MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Exploiting short-lived variables in superscalar processors
Proceedings of the 28th annual international symposium on Microarchitecture
Multiple-banked register file architectures
Proceedings of the 27th annual international symposium on Computer architecture
Reducing the complexity of the register file in dynamic superscalar processors
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Automatically characterizing large scale program behavior
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
The Alpha 21264 Microprocessor
IEEE Micro
Non-Consistent Dual Register Files to Reduce Register Pressure
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
ICCD '02 Proceedings of the 2002 IEEE International Conference on Computer Design: VLSI in Computers and Processors (ICCD'02)
Banked multiported register files for high-frequency superscalar microprocessors
Proceedings of the 30th annual international symposium on Computer architecture
HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Reducing Datapath Energy through the Isolation of Short-Lived Operands
Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques
Exploiting Value Locality in Physical Register Files
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Use-Based Register Caching with Decoupled Indexing
Proceedings of the 31st annual international symposium on Computer architecture
Selective writeback: reducing register file pressure and energy consumption
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Energy-efficient register caching with compiler assistance
ACM Transactions on Architecture and Code Optimization (TACO)
Enabling large decoded instruction loop caching for energy-aware embedded processors
CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
Energy-efficient mechanisms for managing thread context in throughput processors
Proceedings of the 38th annual international symposium on Computer architecture
A Hierarchical Thread Scheduler and Register File for Energy-Efficient Throughput Processors
ACM Transactions on Computer Systems (TOCS)
DLIC: Decoded loop instructions caching for energy-aware embedded processors
ACM Transactions on Embedded Computing Systems (TECS)
Hi-index | 0.00 |
With the use of faster clocks and larger instruction windows in high-end superscalar processors, the physical register files (RFs) can no longer be accessed in a single cycle. To combat the consequential performance penalty, the RFs employ multiple levels of bypassing. Register file caching, which caches a small subset of the registers in a faster, smaller structure called the register file cache (RFC) has also been proposed as a remedy for this problem. We introduce a relatively simple RFC design that partitions the RFC into two separate components: a FIFO queue for holding register values that are used over a short duration following their writeback and another small set-associative cache holding values that are likely to be used over a longer duration. Results written to the RFC are easily classified into these categories and the classification bit is also used to predict the nature of the result for the next execution of the same instruction. We show that significant energy savings - about 38% on the average - occurs in accessing register operands when a 28-entry RFC is used, together with a 96-entry RF with no additional bypassing when compared with a base case design that has 128 registers with a 2 cycle access time and having one additional level of bypassing. The performance drop compared against the base case is also negligible (0.3% drop).