MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Exploiting short-lived variables in superscalar processors
Proceedings of the 28th annual international symposium on Microarchitecture
Multiple-banked register file architectures
Proceedings of the 27th annual international symposium on Computer architecture
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Low-complexity reorder buffer architecture
ICS '02 Proceedings of the 16th international conference on Supercomputing
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Reducing the complexity of the register file in dynamic superscalar processors
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
The Alpha 21264 Microprocessor
IEEE Micro
Cherry: checkpointed early resource recycling in out-of-order microprocessors
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Reducing register ports for higher speed and lower energy
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Reducing register ports using delayed write-back queues and operand pre-fetch
ICS '03 Proceedings of the 17th annual international conference on Supercomputing
HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Isolating Short-Lived Operands for Energy Reduction
IEEE Transactions on Computers
Hi-index | 0.00 |
Modern superscalar processors implement precise interrupts by using the Reorder Buffer (ROB). In some microarchitectures , such as the Intel P6, the ROB also serves as a repository for the uncommitted results. In these designs, the ROB is a complex multi-ported structure that dissipates a significant percentage of the overall chip power. Recently, a mechanism was introduced for reducing the ROB complexity and its power dissipation through the complete elimination of read ports for reading out source operands. The resulting performance degradation is countered by caching the most recently produced results in a small set of associatively-addressed latches ("retention latches"). We propose an enhancement to the above technique by leveraging the notion of short-lived operands (values targeting the registers that are renamed by the time the instruction producing the value reaches the writeback stage). As much as 87% of all generated values are short lived for the SPEC 2000 benchmarks. Significant improvements in the utilization of retention latches, the overall performance, complexity and power are achieved by not caching short-lived values in the retention latches. As few as two retention latches allow all source operand read ports on the ROB to be completely eliminated with very little impact on performance.