Exploiting short-lived variables in superscalar processors
Proceedings of the 28th annual international symposium on Microarchitecture
The multicluster architecture: reducing cycle time through partitioning
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Delaying physical register allocation through virtual-physical registers
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Multiple-banked register file architectures
Proceedings of the 27th annual international symposium on Computer architecture
Reducing the complexity of the register file in dynamic superscalar processors
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Dynamic dead-instruction detection and elimination
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
The MIPS R10000 Superscalar Microprocessor
IEEE Micro
The Alpha 21264 Microprocessor
IEEE Micro
Cherry: checkpointed early resource recycling in out-of-order microprocessors
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Characterizing and predicting value degree of use
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
Hardware Schemes for Early Register Release
ICPP '02 Proceedings of the 2002 International Conference on Parallel Processing
HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
A Scalable Register File Architecture for Dynamically Scheduled Processors
PACT '96 Proceedings of the 1996 Conference on Parallel Architectures and Compilation Techniques
Reducing Datapath Energy through the Isolation of Short-Lived Operands
Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques
Checkpoint Processing and Recovery: Towards Scalable Large Instruction Window Processors
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Exploiting Value Locality in Physical Register Files
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Use-Based Register Caching with Decoupled Indexing
Proceedings of the 31st annual international symposium on Computer architecture
Proceedings of the 31st annual international symposium on Computer architecture
Increasing Processor Performance Through Early Register Release
ICCD '04 Proceedings of the IEEE International Conference on Computer Design
Register Packing: Exploiting Narrow-Width Operands for Reducing Register File Pressure
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Dynamically reducing pressure on the physical register file through simple register sharing
ISPASS '04 Proceedings of the 2004 IEEE International Symposium on Performance Analysis of Systems and Software
Checkpoint allocation and release
ACM Transactions on Architecture and Code Optimization (TACO)
Energy-efficient register caching with compiler assistance
ACM Transactions on Architecture and Code Optimization (TACO)
Automatic Parallelization and Optimization of Programs by Proof Rewriting
SAS '09 Proceedings of the 16th International Symposium on Static Analysis
Circuit design of a dual-versioning L1 data cache for optimistic concurrency
Proceedings of the 21st edition of the great lakes symposium on Great lakes symposium on VLSI
Circuit design of a dual-versioning L1 data cache
Integration, the VLSI Journal
Hi-index | 14.98 |
Modern superscalar microprocessors need sizable register files to support a large number of in-flight instructions for exploiting instruction level parallelism (ILP). An alternative to building large register files is to use a smaller number of registers, but manage them more effectively. More efficient management of registers can also result in higher performance if the reduction of the register file size is not the goal. Traditional register file management mechanisms deallocate a physical register only when the next instruction writing to the same destination architectural register commits. In this paper, we propose several techniques for deallocating physical registers much earlier. Our designs rely on the use of a checkpointed register file (CRF), where a local shadow copy of each bitcell is used to temporarily save the values of the early deallocated registers should they be needed to recover from branch mispredictions or to reconstruct the precise state after exceptions or interrupts. The proposed techniques try to release registers as soon as possible and are more aggressive than the previously proposed schemes for early deallocation of registers.