Reducing Rename Logic Complexity for High-Speed and Low-Power Front-End Architectures

Authors:
Rama Sangireddy
Affiliations:
IEEE
Venue:
IEEE Transactions on Computers
Year:
2006

Citing 15
Cited 5

Facilitating superscalar processing via a combined static/dynamic register renaming scheme

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Complexity-effective superscalar processors

Proceedings of the 24th annual international symposium on Computer architecture
The SimpleScalar tool set, version 2.0

ACM SIGARCH Computer Architecture News
Increasing processor performance by implementing deeper pipelines

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
The Alpha 21264 Microprocessor

IEEE Micro
The Design Space of Register Renaming Techniques

IEEE Micro
Using SimPoint for accurate and efficient simulation

SIGMETRICS '03 Proceedings of the 2003 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Half-price architecture

Proceedings of the 30th annual international symposium on Computer architecture
Parallelism in the front-end

Proceedings of the 30th annual international symposium on Computer architecture
Checkpointing alternatives for high performance, power-aware processors

Proceedings of the 2003 international symposium on Low power electronics and design
A Hierarchical Dependence Check and Folded Rename Mapping Based Scalable Dispatch Stage

ICCD '01 Proceedings of the International Conference on Computer Design: VLSI in Computers & Processors
A Dependence Driven Efficient Dispatch Scheme

ICCD '03 Proceedings of the 21st International Conference on Computer Design
Thermal-Aware Clustered Microarchitectures

ICCD '04 Proceedings of the IEEE International Conference on Computer Design
Distributing the Frontend for Temperature Reduction

HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Organization and implementation of the register-renaming mapper for out-of-order IBM POWER4 processors

IBM Journal of Research and Development - Electrochemical technology in microelectronics

On the latency, energy and area of checkpointed, superscalar register alias tables

ISLPED '07 Proceedings of the 2007 international symposium on Low power electronics and design
Streamlining long latency instructions for seamlessly combined out-of-order and in-order execution

Microprocessors & Microsystems
Checkpoint allocation and release

ACM Transactions on Architecture and Code Optimization (TACO)
On the latency and energy of checkpointed superscalar register alias tables

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Exploiting inactive rename slots for detecting soft errors

ARCS'10 Proceedings of the 23rd international conference on Architecture of Computing Systems

Quantified Score

Hi-index	14.98

Visualization

Abstract

In modern day high-performance processors, the complexity of the register rename logic grows along with the pipeline width and leads to larger renaming time delay and higher power consumption. Renaming logic in the front-end of the processor is one of the largest contributors of peak temperatures on the chip and, so, demands attention to reduce the power consumption. Further, with the advent of clustered microarchitectures, the rename map table at the front-end is shared by the clusters and, hence, its critical path delay should not become a bottleneck in determining the processor clock cycle time. Analysis of characteristics of Spec2000 integer benchmark programs reveals that, when the programs are processed in a 4-wide processor, none or only one two-source instruction (an instruction with two source registers) is renamed in a cycle for 94 percent of the total execution time. Similarly, in an 8-wide processor, none or only one two-source instruction is renamed in a cycle for 92 percent of the total execution time. Thus, the analysis observes that the rename map table port bandwidth is highly underutilized for a significant portion of time. Based on the analysis, in this paper, we propose a novel technique to significantly reduce the number of ports in the rename map table. The novelty of the technique is that it is easy to implement and succeeds in reducing the access time, power, and area of the rename logic, without any additional power, area, and delay overheads in any other logic on the chip. The proposed technique performs the register renaming of instructions in the order of their fetch, with no significant impact on the processor's performance. With this technique in an 8-wide processor, as compared to a conventional rename map table in an integer pipeline with 16 ports to look up source operands, a rename map table with nine ports results in a reduction in access time, power, and area by 14 percent, 42 percent, and 49 percent, respectively, with only 4.7 percent loss in instructions committed per cycle (IPC). The implementation of the technique in a 4-wide processor results in a reduction in access time, power, and area by 7 percent, 38 percent, and 59 percent, respectively, with an IPC loss of only 4.4 percent.