Low-complexity reorder buffer architecture

Authors:
Gurhan Kucuk;Dmitry Ponomarev;Kanad Ghose
Affiliations:
University of New York, Binghamton, NY;University of New York, Binghamton, NY;University of New York, Binghamton, NY
Venue:
ICS '02 Proceedings of the 16th international conference on Supercomputing
Year:
2002

Citing 12
Cited 8

Exploiting short-lived variables in superscalar processors

Proceedings of the 28th annual international symposium on Microarchitecture
Implementation of precise interrupts in pipelined processors

ISCA '85 Proceedings of the 12th annual international symposium on Computer architecture
Multiple-banked register file architectures

Proceedings of the 27th annual international symposium on Computer architecture
Energy-effective issue logic

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Energy: efficient instruction dispatch buffer design for superscalar processors

ISLPED '01 Proceedings of the 2001 international symposium on Low power electronics and design
Reducing power requirements of instruction scheduling through dynamic allocation of multiple datapath resources

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Reducing the complexity of the register file in dynamic superscalar processors

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
The Alpha 21264 Microprocessor

IEEE Micro
Power-Aware Microarchitecture: Design and Modeling Challenges for Next-Generation Microprocessors

IEEE Micro
Non-Consistent Dual Register Files to Reduce Register Pressure

HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
AccuPower: An Accurate Power Estimation Tool for Superscalar Microprocessors

Proceedings of the conference on Design, automation and test in Europe
Loose Loops Sink Chips

HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture

Energy-Efficient Design of the Reorder Buffer

PATMOS '02 Proceedings of the 12th International Workshop on Integrated Circuit Design. Power and Timing Modeling, Optimization and Simulation
Reducing reorder buffer complexity through selective operand caching

Proceedings of the 2003 international symposium on Low power electronics and design
Complexity-Effective Reorder Buffer Designs for Superscalar Processors

IEEE Transactions on Computers
Isolating Short-Lived Operands for Energy Reduction

IEEE Transactions on Computers
An L2-miss-driven early register deallocation for SMT processors

Proceedings of the 21st annual international conference on Supercomputing
Improving performance and reducing energy-delay with adaptive resource resizing for out-of-order embedded processors

Proceedings of the 2008 ACM SIGPLAN-SIGBED conference on Languages, compilers, and tools for embedded systems
Reducing register pressure in SMT processors through L2-miss-driven early register release

ACM Transactions on Architecture and Code Optimization (TACO)
Energy-efficient register caching with compiler assistance

ACM Transactions on Architecture and Code Optimization (TACO)

Quantified Score

Hi-index	0.01

Visualization

Abstract

In some of today's superscalar processors (e.g.the Pentium III), the result repositories are implemented as the Reorder Buffer (ROB) slots. In such designs, the ROB is a complex multi-ported structure that occupies a significant portion of the die area and dissipates a non-trivial fraction of the total chip power, as much as 27% according to some estimates. In addition, an access to such ROB typically takes more than one cycle, impacting the IPC adversely.We propose a low-complexity and low-power ROB design that exploits the fact that the bulk of the source operand values is obtained through data forwarding to the issue queue or through direct reads of the committed register values. Our ROB design uses an organization that completely eliminates the read ports needed to read out operand values for instruction issue. Any consequential performance degradation is countered by using a small number of associatively-addressed retention latches to hold the most recent set of values written into the ROB. The contents of the retention latches are used to satisfy the operand reads for issue that would otherwise have to be read from the ROB slots. Significant savings of the ROB real estate as well as power savings in the range of 20% to 30% for the ROB are achieved using the proposed technique. At the same time, the fact that results are accessible in a single cycle from the retention latches actually leads to an overall improvement in the IPC of up to 3% on the average for SPEC 2000 benchmarks.