Wake-up logic optimizations through selective match and wakeup range limitation

Authors:
Kuo-Su Hsiao;Chung-Ho Chen
Affiliations:
Department of Electrical Engineering, National Cheng Kung University, Tainan, Taiwan, R.O.C;Department of Electrical Engineering and Institute of Computer and Communication Engineering, National Cheng Kung University, Tainan, Taiwan, R.O.C
Venue:
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Year:
2006

Citing 23
Cited 0

An investigation of the performance of various dynamic scheduling techniques

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Complexity-effective superscalar processors

Proceedings of the 24th annual international symposium on Computer architecture
MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Computer architecture (2nd ed.): a quantitative approach

Computer architecture (2nd ed.): a quantitative approach
Load latency tolerance in dynamically scheduled processors

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Wattch: a framework for architectural-level power analysis and optimizations

Proceedings of the 27th annual international symposium on Computer architecture
Circuits for wide-window superscalar processors

Proceedings of the 27th annual international symposium on Computer architecture
Reducing the complexity of the issue logic

ICS '01 Proceedings of the 15th international conference on Supercomputing
Energy-effective issue logic

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
The optimal logic depth per pipeline stage is 6 to 8 FO4 inverter delays

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Efficient dynamic scheduling through tag elimination

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
A scalable instruction queue design using dependence chains

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Select-free instruction scheduling logic

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
A high-speed dynamic instruction scheduling scheme for superscalar processors

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Energy-efficient hybrid wakeup logic

Proceedings of the 2002 international symposium on Low power electronics and design
The PowerPC 604 RISC microprocessor

IEEE Micro
The MIPS R10000 Superscalar Microprocessor

IEEE Micro
The HP PA-8000 RISC CPU

IEEE Micro
The Alpha 21264 Microprocessor

IEEE Micro
Half-price architecture

Proceedings of the 30th annual international symposium on Computer architecture
Cyclone: a broadcast-free dynamic instruction scheduler with selective replay

Proceedings of the 30th annual international symposium on Computer architecture
Data-Flow Prescheduling for Large Instruction Windows in Out-of-Order Processors

HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Energy-efficient issue queue design

IEEE Transactions on Very Large Scale Integration (VLSI) Systems - Special section on low power

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents two effective wakeup designs that improve the speed, power, area, and scalability without instructions per cycle (IPC) loss for dynamic instruction schedulers. First, a wakeup design is proposed to aim at reducing the power consumption and wakeup latency. This design removes the READ of the destination tags from the wakeup path by matching the source tags directly with the grant lines. Moreover, this design eliminates the redundant matches during the wakeup operations by matching the source tags with only the selected grant lines. Next, the second design explores a metric called wakeup locality to further reduce the area cost of the wakeup logic. By limiting the wakeup ranges for the instructions in the issue window, this design not only reduces the area requirement but also improves the scalability. The experimental results show that this range-limited-wakeup design saves 76%-94% of the power consumption and reduces 29%-77% in the wakeup latency compared to the conventional CAM-based scheme with only 5%-44% of the area cost in a traditional RAM-based scheme. The results also show that this design scales well with the increase of both the issue width and the window size.