MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Proceedings of the 24th annual international symposium on Computer architecture
Complexity-effective superscalar processors
Proceedings of the 24th annual international symposium on Computer architecture
Power considerations in the design of the Alpha 21264 microprocessor
DAC '98 Proceedings of the 35th annual Design Automation Conference
Pipeline gating: speculation control for energy reduction
Proceedings of the 25th annual international symposium on Computer architecture
Load latency tolerance in dynamically scheduled processors
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Dynamic removal of redundant computations
ICS '99 Proceedings of the 13th international conference on Supercomputing
Implementation of precise interrupts in pipelined processors
ISCA '85 Proceedings of the 12th annual international symposium on Computer architecture
Proceedings of the 14th international conference on Supercomputing
On pipelining dynamic instruction scheduling logic
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Efficient dynamic scheduling through tag elimination
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
A large, fast instruction window for tolerating cache misses
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
A scalable instruction queue design using dependence chains
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Select-free instruction scheduling logic
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Saving energy with just in time instruction delivery
Proceedings of the 2002 international symposium on Low power electronics and design
Automatically characterizing large scale program behavior
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Superscalar Execution with Direct Data Forwarding
PACT '98 Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques
The Alpha 21264 Microprocessor Architecture
ICCD '98 Proceedings of the International Conference on Computer Design
Improving Processor Performance by Simplifying and Bypassing Trivial Computations
ICCD '02 Proceedings of the 2002 IEEE International Conference on Computer Design: VLSI in Computers and Processors (ICCD'02)
Proceedings of the 30th annual international symposium on Computer architecture
Energy efficient co-adaptive instruction fetch and issue
Proceedings of the 30th annual international symposium on Computer architecture
Data-Flow Prescheduling for Large Instruction Windows in Out-of-Order Processors
HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Macro-op Scheduling: Relaxing Scheduling Loop Constraints
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Caching Function Results: Faster Arithmetic by Avoiding Unnecessary Computation
Caching Function Results: Faster Arithmetic by Avoiding Unnecessary Computation
An efficient wakeup design for energy reduction in high-performance superscalar processors
Proceedings of the 2nd conference on Computing frontiers
A Criticality Analysis of Clustering in Superscalar Processors
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Hi-index | 0.00 |
Out-of-order execution significantly increases the performanceof superscalar processors. The out-of-order execution mechanismis, however, energy-inefficient, which inhibits scaling superscalar processorsto high issue widths and large instruction windows. In this paper, we build on the observation that between 19% and 36% of the instructions are immediately ready for execution, even before entering the issue queue. Yet, these instructions proceed to the energy-consuming steps ofinstruction wake-up and select and they needlessly occupy space in theissue queue. To save energy, we propose for these instructions to by-pass the out-of-order execution core. Instead, we execute them on an energy-efficient single-issue in-order by-pass pipeline.The by-pass pipeline executes a significant fraction of all instructions,allowing performance-energy trade-offs with respect to the issue width of the out-of-order pipeline and to the issue queue size.By making these trade-offs, we show energy reductions of 53% for the issue queue, 33% for the register file and 31% in the write-back and wake-up logic. Performance remains almost unaffected.