Delaying physical register allocation through virtual-physical registers
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Integrating superscalar processor components to implement register caching
ICS '01 Proceedings of the 15th international conference on Supercomputing
Late Allocation and Early Release of Physical Registers
IEEE Transactions on Computers
Register port complexity reduction in wide-issue processors with selective instruction execution
Microprocessors & Microsystems
Exploiting execution locality with a decoupled Kilo-instruction processor
ISHPC'05/ALPS'06 Proceedings of the 6th international symposium on high-performance computing and 1st international conference on Advanced low power systems
ARCS'13 Proceedings of the 26th international conference on Architecture of Computing Systems
Virtual register renaming: energy efficient substrate for continual flow pipelines
Proceedings of the 23rd ACM international conference on Great lakes symposium on VLSI
Hi-index | 0.01 |
The number of physical registers is one of the criticalissues of current superscalar out-of-order processors.Conventional architectures allocate in the decode stage anew storage location (e.g. physical register) for eachoperation that has a destination register.When aninstruction is committed, it frees the physical registerallocated to the previous instruction that had the samedestination logical register.Thus, an additional register(i.e. in addition to the number of logical registers) is usedfor each instruction with a destination register from thetime it is decoded until it is committed.In this paper wepropose a novel register organization that allocatesphysical registers when instructions complete execution.Inthis way, the register pressure is significantly reducedsince the additional register is only spent from the timeexecution completes until the instruction is committed.Forsome long latency instructions (e.g. load with a cache miss)and for parts of the code with small amount of parallelism,the savings could be very high.We have evaluated the newscheme for a superscalar processor and obtained asignificant speedup.