ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Proceedings of the 24th annual international symposium on Computer architecture
Complexity-effective superscalar processors
Proceedings of the 24th annual international symposium on Computer architecture
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
The Alpha 21264 Microprocessor
IEEE Micro
The PowerPC 620 microprocessor: a high performance superscalar RISC microprocessor
COMPCON '95 Proceedings of the 40th IEEE Computer Society International Conference
Register File Design Considerations in Dynamically Scheduled Processors
HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
A Scalable Register File Architecture for Dynamically Scheduled Processors
PACT '96 Proceedings of the 1996 Conference on Parallel Architectures and Compilation Techniques
HIPC '97 Proceedings of the Fourth International Conference on High-Performance Computing
Load and store reuse using register file contents
ICS '01 Proceedings of the 15th international conference on Supercomputing
Integrating superscalar processor components to implement register caching
ICS '01 Proceedings of the 15th international conference on Supercomputing
Dynamically allocating processor resources between nearby and distant ILP
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Reducing the complexity of the register file in dynamic superscalar processors
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Selective Register Renaming: A Compiler-Driven Approach to Dynamic Register Renaming
HPCN Europe 2001 Proceedings of the 9th International Conference on High-Performance Computing and Networking
Cherry: checkpointed early resource recycling in out-of-order microprocessors
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Mini-Threads: Increasing TLP on Small-Scale SMT Processors
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Exploiting Value Locality in Physical Register Files
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
A first glance at Kilo-instruction based multiprocessors
Proceedings of the 1st conference on Computing frontiers
Reducing register pressure through LAER algorithm
ACSC '04 Proceedings of the 27th Australasian conference on Computer science - Volume 26
Proceedings of the 31st annual international symposium on Computer architecture
Late Allocation and Early Release of Physical Registers
IEEE Transactions on Computers
A case for resource-conscious out-of-order processors: towards kilo-instruction in-flight processors
MEDEA '03 Proceedings of the 2003 workshop on MEmory performance: DEaling with Applications , systems and architecture
Register Packing: Exploiting Narrow-Width Operands for Reducing Register File Pressure
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Toward kilo-instruction processors
ACM Transactions on Architecture and Code Optimization (TACO)
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Speculative early register release
Proceedings of the 3rd conference on Computing frontiers
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Early Register Deallocation Mechanisms Using Checkpointed Register Files
IEEE Transactions on Computers
Late-binding: enabling unordered load-store queues
Proceedings of the 34th annual international symposium on Computer architecture
MEDEA '07 Proceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architecture
IEEE Transactions on Computers
Hardware support for early register release
International Journal of High Performance Computing and Networking
Asymmetrically banked value-aware register files for low-energy and high-performance
Microprocessors & Microsystems
Reducing register file size through instruction pre-execution enhanced by value prediction
ICCD'09 Proceedings of the 2009 IEEE international conference on Computer design
Mobile Information Systems - Mobile and Wireless Networks
Exploiting narrow-width values for thermal-aware register file designs
Proceedings of the Conference on Design, Automation and Test in Europe
Compiler assisted dynamic management of registers for network processors
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Register Cache System Not for Latency Reduction Purpose
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
CRAM: coded registers for amplified multiporting
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Implicit transactional memory in kilo-instruction multiprocessors
ACSAC'07 Proceedings of the 12th Asia-Pacific conference on Advances in Computer Systems Architecture
Hi-index | 0.01 |
Register file access time represents one of the critical delays of current microprocessors, and it is expected to become more critical as future processors increase the instruction window size and the issue width. This paper present a novel physical register management scheme that allows for a late allocation (at the end of execution) of registers. We show that it can provide significant savings in number of registers and thus, it can significantly shorter the register file access time. The approach is based on virtual-physical registers, which we presented in a previous work, extended with a new register allocation policy. This policy consists of an on-demand allocation in order to maximize the register usage, combined with a stealing mechanism that prevents older instruction from being delayed by younger ones. This shortens the average number of cycles that each physical register is allocated, and allows for an early execution of instructions since they can obtain a physical register for its destination earlier than with the conventional scheme. Early execution is especially beneficial for branches and memory operations, since the former can be resolved earlier and the latter can prefetch their data in advance.