An efficient wakeup design for energy reduction in high-performance superscalar processors

Authors:
Kuo-Su Hsiao;Chung-Ho Chen
Affiliations:
National Cheng Kung University, Taiwan;National Cheng Kung University, Taiwan
Venue:
Proceedings of the 2nd conference on Computing frontiers
Year:
2005

Citing 14
Cited 2

An investigation of the performance of various dynamic scheduling techniques

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Complexity-effective superscalar processors

Proceedings of the 24th annual international symposium on Computer architecture
MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Power considerations in the design of the Alpha 21264 microprocessor

DAC '98 Proceedings of the 35th annual Design Automation Conference
Wattch: a framework for architectural-level power analysis and optimizations

Proceedings of the 27th annual international symposium on Computer architecture
Circuits for wide-window superscalar processors

Proceedings of the 27th annual international symposium on Computer architecture
Energy-effective issue logic

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Efficient dynamic scheduling through tag elimination

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
A scalable instruction queue design using dependence chains

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
A high-speed dynamic instruction scheduling scheme for superscalar processors

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Energy-efficient hybrid wakeup logic

Proceedings of the 2002 international symposium on Low power electronics and design
Half-price architecture

Proceedings of the 30th annual international symposium on Computer architecture
Cyclone: a broadcast-free dynamic instruction scheduler with selective replay

Proceedings of the 30th annual international symposium on Computer architecture
Data-Flow Prescheduling for Large Instruction Windows in Out-of-Order Processors

HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture

By-passing the out-of-order execution pipeline to increase energy-efficiency

Proceedings of the 4th international conference on Computing frontiers
Scalable Dynamic Instruction Scheduler through Wake-Up Spatial Locality

IEEE Transactions on Computers

Quantified Score

Hi-index	0.00

Visualization

Abstract

In modern superscalar processors, the complex instruction scheduler could form the critical path of the pipeline stages and limit the clock cycle time. In addition, complex scheduling logic results in the formation of a hot spot on the processor chip. Consequently, the latency and power consumption of the dynamic scheduler are two of the most crucial design issues when developing a high-performance microprocessor. We propose an instruction wakeup scheme that remedies the speed and power issues faced with conventional designs. This is achieved by a new design that separates RAM cells from the match circuits. This separated design is such that the advantages of the CAM and bit-map RAM schemes are retained, while their respective disadvantages are eliminated. Specifically, the proposed design retains the moderate area advantage of the CAM scheme and the low power and low latency advantages of the bit-map RAM schemeThe experimental results show that the proposed design saves power consumption by 80% compared to the traditional CAM-based design and 18% to the bit-map RAM design, respectively. In speed, the proposed design reduces an average of 77% in the wakeup latency compared to the conventional CAM-based design and an average of 33% reduction of the latency of the bit-map RAM design. For an 8-issue superscalar processor, the proposed design reduces the power consumption of the conventional wakeup logic by 80%, while simultaneously increasing the Instruction Count per nano-second (IPns) by a factor of approximately 2.5 times with a moderate area cost