A Hierarchical Dependence Check and Folded Rename Mapping Based Scalable Dispatch Stage

Authors:
Affiliations:
Venue:
ICCD '01 Proceedings of the International Conference on Computer Design: VLSI in Computers & Processors
Year:
2001

Citing 0
Cited 1

Reducing Rename Logic Complexity for High-Speed and Low-Power Front-End Architectures

IEEE Transactions on Computers

Quantified Score

Hi-index	0.00

Visualization

Abstract

Abstract: In a superscalar pipeline, the dispatch stage performs register renaming, which involves map table lookup logic and dependence check logic. Both subtasks do not scale well with the dispatch width of the processor. The number of comparators necessary for the dependence check logic grows quadratically with the dispatch width of the processor. The rename map table's word line capacitance scales linearly with the dispatch width. This paper proposes and evaluates schemes to alleviate both these problems. By performing the dependence check hierarchically in two stages, the number of comparators required in the dependence check logic is reduced from quadratic to linear in the dispatch width. This scheme is also scalable with the dispatch width by allowing a dispatch of DW2 instructions in the same processor cycle time that the current microprocessors use to dispatch DW instructions. Simplescalar based simulations indicate a performance penalty of less than 10% over Spec95 CPU benchmarks due to the extra cycle introduced. The second scheme started with an objective of utilizing speculation in rename and dependence information. The only beneficial subspace of this speculation appears to be the reuse of rename information of those instructions whose source operands are produced either in their own basic block or in the immediately preceding basic block. By storing rename information of such instructions in a rename cache, these instructions can be dispatched directly to the reservation stations if the program takes the same path again. The performance improvement due to the rename cache is approximately 7% for SPEC95 integer benchmarks.