A scalable low power issue queue for large instruction window processors

Authors:
Rajesh Vivekanandham;Bharadwaj Amrutur;R. Govindarajan
Affiliations:
Indian Institute of Science, Bangalore, India;Indian Institute of Science, Bangalore, India;Indian Institute of Science, Bangalore, India
Venue:
Proceedings of the 20th annual international conference on Supercomputing
Year:
2006

Citing 17
Cited 1

Complexity-effective superscalar processors

Proceedings of the 24th annual international symposium on Computer architecture
Clock rate versus IPC: the end of the road for conventional microarchitectures

Proceedings of the 27th annual international symposium on Computer architecture
On pipelining dynamic instruction scheduling logic

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
The optimal logic depth per pipeline stage is 6 to 8 FO4 inverter delays

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Efficient dynamic scheduling through tag elimination

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Select-free instruction scheduling logic

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
A high-speed dynamic instruction scheduling scheme for superscalar processors

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Tradeoffs in power-efficient issue queue design

Proceedings of the 2002 international symposium on Low power electronics and design
Energy-efficient hybrid wakeup logic

Proceedings of the 2002 international symposium on Low power electronics and design
Automatically characterizing large scale program behavior

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
The AMD Opteron Processor for Multiprocessor Servers

IEEE Micro
Half-price architecture

Proceedings of the 30th annual international symposium on Computer architecture
Cyclone: a broadcast-free dynamic instruction scheduler with selective replay

Proceedings of the 30th annual international symposium on Computer architecture
Data-Flow Prescheduling for Large Instruction Windows in Out-of-Order Processors

HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Checkpoint Processing and Recovery: Towards Scalable Large Instruction Window Processors

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Toward kilo-instruction processors

ACM Transactions on Architecture and Code Optimization (TACO)
Exploring Wakeup-Free Instruction Scheduling

HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture

Forwardflow: a scalable core for power-constrained CMPs

Proceedings of the 37th annual international symposium on Computer architecture

Quantified Score

Hi-index	0.00

Visualization

Abstract

Large instruction windows and issue queues are key to exploiting greater instruction level parallelism in out-of-order superscalar processors. However, the cycle time and energy consumption of conventional large monolithic issue queues are high. Previous efforts to reduce cycle time segment the issue queue and pipeline wakeup. Unfortunately, this results in significant IPC loss. Other proposals which address energy efficiency issues by avoiding only the unnecessary tag-comparisons do not reduce broadcasts. These schemes also increase the issue latency.To address both these issues comprehensively, we propose the Scalable Lowpower Issue Queue (SLIQ). SLIQ augments a pipelined issue queue with direct indexing to mitigate the problem of delayed wakeups while reducing the cycle time. Also, the SLIQ design naturally leads to significant energy savings by reducing both the number of tag broadcasts and comparisons required.A 2 segment SLIQ incurs an average IPC loss of 0.2% over the entire SPEC CPU2000 suite, while achieving a 25.2% reduction in issue latency when compared to a monolithic 128-entry issue queue for an 8-wide superscalar processor. An 8 segment SLIQ improves scalability by reducing the issue latency by 38.3% while incurring an IPC loss of only 2.3%. Further, the 8 segment SLIQ significantly reduces the energy consumption and energy-delay product by 48.3% and 67.4% respectively on average.