By-passing the out-of-order execution pipeline to increase energy-efficiency

Authors:
Hans Vandierendonck;Philippe Manet;Thibault Delavallee;Igor Loiselle;Jean-Didier Legat
Affiliations:
Ghent University, Ghent, Belgium;Universite catholique de Louvain, Louvain-la-Neuve, Belgium;Universite catholique de Louvain, Louvain-la-Neuve, Belgium;Universite catholique de Louvain, Louvain-la-Neuve, Belgium;Universite catholique de Louvain, Louvain-la-Neuve, Belgium
Venue:
Proceedings of the 4th international conference on Computing frontiers
Year:
2007

Citing 28
Cited 0

Register traffic analysis for streamlining inter-operation communication in fine-grain parallel processors

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Dynamic instruction reuse

Proceedings of the 24th annual international symposium on Computer architecture
Complexity-effective superscalar processors

Proceedings of the 24th annual international symposium on Computer architecture
Power considerations in the design of the Alpha 21264 microprocessor

DAC '98 Proceedings of the 35th annual Design Automation Conference
Pipeline gating: speculation control for energy reduction

Proceedings of the 25th annual international symposium on Computer architecture
Load latency tolerance in dynamically scheduled processors

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Dynamic removal of redundant computations

ICS '99 Proceedings of the 13th international conference on Supercomputing
Implementation of precise interrupts in pipelined processors

ISCA '85 Proceedings of the 12th annual international symposium on Computer architecture
A low-complexity issue logic

Proceedings of the 14th international conference on Supercomputing
On pipelining dynamic instruction scheduling logic

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Energy-effective issue logic

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Efficient dynamic scheduling through tag elimination

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
A large, fast instruction window for tolerating cache misses

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
A scalable instruction queue design using dependence chains

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Select-free instruction scheduling logic

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Saving energy with just in time instruction delivery

Proceedings of the 2002 international symposium on Low power electronics and design
Automatically characterizing large scale program behavior

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
The Design Space of Register Renaming Techniques

IEEE Micro
Superscalar Execution with Direct Data Forwarding

PACT '98 Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques
The Alpha 21264 Microprocessor Architecture

ICCD '98 Proceedings of the International Conference on Computer Design
Improving Processor Performance by Simplifying and Bypassing Trivial Computations

ICCD '02 Proceedings of the 2002 IEEE International Conference on Computer Design: VLSI in Computers and Processors (ICCD'02)
Half-price architecture

Proceedings of the 30th annual international symposium on Computer architecture
Energy efficient co-adaptive instruction fetch and issue

Proceedings of the 30th annual international symposium on Computer architecture
Data-Flow Prescheduling for Large Instruction Windows in Out-of-Order Processors

HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Macro-op Scheduling: Relaxing Scheduling Loop Constraints

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Caching Function Results: Faster Arithmetic by Avoiding Unnecessary Computation

Caching Function Results: Faster Arithmetic by Avoiding Unnecessary Computation
An efficient wakeup design for energy reduction in high-performance superscalar processors

Proceedings of the 2nd conference on Computing frontiers
A Criticality Analysis of Clustering in Superscalar Processors

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture

Quantified Score

Hi-index	0.00

Visualization

Abstract

Out-of-order execution significantly increases the performanceof superscalar processors. The out-of-order execution mechanismis, however, energy-inefficient, which inhibits scaling superscalar processorsto high issue widths and large instruction windows. In this paper, we build on the observation that between 19% and 36% of the instructions are immediately ready for execution, even before entering the issue queue. Yet, these instructions proceed to the energy-consuming steps ofinstruction wake-up and select and they needlessly occupy space in theissue queue. To save energy, we propose for these instructions to by-pass the out-of-order execution core. Instead, we execute them on an energy-efficient single-issue in-order by-pass pipeline.The by-pass pipeline executes a significant fraction of all instructions,allowing performance-energy trade-offs with respect to the issue width of the out-of-order pipeline and to the issue queue size.By making these trade-offs, we show energy reductions of 53% for the issue queue, 33% for the register file and 31% in the write-back and wake-up logic. Performance remains almost unaffected.