Delaying physical register allocation through virtual-physical registers

Authors:
Teresa Monreal;Antonio González;Mateo Valero;José González;Victor Viñals
Affiliations:
Departamento de Informática e Ing. de Sistemas, Centro Politécnico Superior - Univ. de Zaragoza;Departament d'Arquitectura de Computadors, Universitat Politècnica de Catalunya;-;-;Departamento de Informática e Ing. de Sistemas, Centro Politécnico Superior - Univ. de Zaragoza
Venue:
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Year:
1999

Citing 10
Cited 32

Exploiting choice: instruction fetch and issue on an implementable simultaneous multithreading processor

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Dynamic instruction reuse

Proceedings of the 24th annual international symposium on Computer architecture
Complexity-effective superscalar processors

Proceedings of the 24th annual international symposium on Computer architecture
A novel renaming scheme to exploit value temporal locality through physical register reuse and unification

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
The Alpha 21264 Microprocessor

IEEE Micro
The PowerPC 620 microprocessor: a high performance superscalar RISC microprocessor

COMPCON '95 Proceedings of the 40th IEEE Computer Society International Conference
Register File Design Considerations in Dynamically Scheduled Processors

HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
Virtual-Physical Registers

HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
A Scalable Register File Architecture for Dynamically Scheduled Processors

PACT '96 Proceedings of the 1996 Conference on Parallel Architectures and Compilation Techniques
Virtual Registers

HIPC '97 Proceedings of the Fourth International Conference on High-Performance Computing

Load and store reuse using register file contents

ICS '01 Proceedings of the 15th international conference on Supercomputing
Integrating superscalar processor components to implement register caching

ICS '01 Proceedings of the 15th international conference on Supercomputing
Dynamically allocating processor resources between nearby and distant ILP

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Reducing the complexity of the register file in dynamic superscalar processors

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
The Design Space of Register Renaming Techniques

IEEE Micro
Selective Register Renaming: A Compiler-Driven Approach to Dynamic Register Renaming

HPCN Europe 2001 Proceedings of the 9th International Conference on High-Performance Computing and Networking
Cherry: checkpointed early resource recycling in out-of-order microprocessors

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Mini-Threads: Increasing TLP on Small-Scale SMT Processors

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Exploiting Value Locality in Physical Register Files

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
A first glance at Kilo-instruction based multiprocessors

Proceedings of the 1st conference on Computing frontiers
Reducing register pressure through LAER algorithm

ACSC '04 Proceedings of the 27th Australasian conference on Computer science - Volume 26
Physical Register Inlining

Proceedings of the 31st annual international symposium on Computer architecture
Late Allocation and Early Release of Physical Registers

IEEE Transactions on Computers
A case for resource-conscious out-of-order processors: towards kilo-instruction in-flight processors

MEDEA '03 Proceedings of the 2003 workshop on MEmory performance: DEaling with Applications , systems and architecture
Register Packing: Exploiting Narrow-Width Operands for Reducing Register File Pressure

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Toward kilo-instruction processors

ACM Transactions on Architecture and Code Optimization (TACO)
How to Fake 1000 Registers

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Speculative early register release

Proceedings of the 3rd conference on Computing frontiers
SPARTAN: speculative avoidance of register allocations to transient values for performance and energy efficiency

Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Early Register Deallocation Mechanisms Using Checkpointed Register Files

IEEE Transactions on Computers
Late-binding: enabling unordered load-store queues

Proceedings of the 34th annual international symposium on Computer architecture
Data prefetching and address pre-calculation through instruction pre-execution with two-step physical register deallocation

MEDEA '07 Proceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architecture
Predicting and Exploiting Transient Values for Reducing Register File Pressure and Energy Consumption

IEEE Transactions on Computers
Hardware support for early register release

International Journal of High Performance Computing and Networking
Asymmetrically banked value-aware register files for low-energy and high-performance

Microprocessors & Microsystems
Reducing register file size through instruction pre-execution enhanced by value prediction

ICCD'09 Proceedings of the 2009 IEEE international conference on Computer design
Balanced bipartite graph based register allocation for network processors in mobile and wireless networks

Mobile Information Systems - Mobile and Wireless Networks
Exploiting narrow-width values for thermal-aware register file designs

Proceedings of the Conference on Design, Automation and Test in Europe
Compiler assisted dynamic management of registers for network processors

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Register Cache System Not for Latency Reduction Purpose

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
CRAM: coded registers for amplified multiporting

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Implicit transactional memory in kilo-instruction multiprocessors

ACSAC'07 Proceedings of the 12th Asia-Pacific conference on Advances in Computer Systems Architecture

Quantified Score

Hi-index	0.01

Visualization

Abstract

Register file access time represents one of the critical delays of current microprocessors, and it is expected to become more critical as future processors increase the instruction window size and the issue width. This paper present a novel physical register management scheme that allows for a late allocation (at the end of execution) of registers. We show that it can provide significant savings in number of registers and thus, it can significantly shorter the register file access time. The approach is based on virtual-physical registers, which we presented in a previous work, extended with a new register allocation policy. This policy consists of an on-demand allocation in order to maximize the register usage, combined with a stealing mechanism that prevents older instruction from being delayed by younger ones. This shortens the average number of cycles that each physical register is allocated, and allows for an early execution of instructions since they can obtain a physical register for its destination earlier than with the conventional scheme. Early execution is especially beneficial for branches and memory operations, since the former can be resolved earlier and the latter can prefetch their data in advance.