Reducing reorder buffer complexity through selective operand caching

Authors:
Gurhan Kucuk;Dmitry Ponomarev;Oguz Ergin;Kanad Ghose
Affiliations:
State University of New York, Binghamton, NY;State University of New York, Binghamton, NY;State University of New York, Binghamton, NY;State University of New York, Binghamton, NY
Venue:
Proceedings of the 2003 international symposium on Low power electronics and design
Year:
2003

Citing 12
Cited 1

Register traffic analysis for streamlining inter-operation communication in fine-grain parallel processors

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Exploiting short-lived variables in superscalar processors

Proceedings of the 28th annual international symposium on Microarchitecture
Multiple-banked register file architectures

Proceedings of the 27th annual international symposium on Computer architecture
Energy-effective issue logic

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Low-complexity reorder buffer architecture

ICS '02 Proceedings of the 16th international conference on Supercomputing
Reducing power requirements of instruction scheduling through dynamic allocation of multiple datapath resources

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Reducing the complexity of the register file in dynamic superscalar processors

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
The Alpha 21264 Microprocessor

IEEE Micro
Cherry: checkpointed early resource recycling in out-of-order microprocessors

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Reducing register ports for higher speed and lower energy

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Reducing register ports using delayed write-back queues and operand pre-fetch

ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Loose Loops Sink Chips

HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture

Isolating Short-Lived Operands for Energy Reduction

IEEE Transactions on Computers

Quantified Score

Hi-index	0.00

Visualization

Abstract

Modern superscalar processors implement precise interrupts by using the Reorder Buffer (ROB). In some microarchitectures , such as the Intel P6, the ROB also serves as a repository for the uncommitted results. In these designs, the ROB is a complex multi-ported structure that dissipates a significant percentage of the overall chip power. Recently, a mechanism was introduced for reducing the ROB complexity and its power dissipation through the complete elimination of read ports for reading out source operands. The resulting performance degradation is countered by caching the most recently produced results in a small set of associatively-addressed latches ("retention latches"). We propose an enhancement to the above technique by leveraging the notion of short-lived operands (values targeting the registers that are renamed by the time the instruction producing the value reaches the writeback stage). As much as 87% of all generated values are short lived for the SPEC 2000 benchmarks. Significant improvements in the utilization of retention latches, the overall performance, complexity and power are achieved by not caching short-lived values in the retention latches. As few as two retention latches allow all source operand read ports on the ROB to be completely eliminated with very little impact on performance.