Energy efficient co-adaptive instruction fetch and issue

Authors:
Alper Buyuktosunoglu;Tejas Karkhanis;David H. Albonesi;Pradip Bose
Affiliations:
University of Rochester;University of Wisconsin-Madison;University of Rochester;IBM T. J. Watson Research Center
Venue:
Proceedings of the 30th annual international symposium on Computer architecture
Year:
2003

Citing 16
Cited 25

Assigning confidence to conditional branch predictions

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Pipeline gating: speculation control for energy reduction

Proceedings of the 25th annual international symposium on Computer architecture
Dynamic IPC/clock rate optimization

Proceedings of the 25th annual international symposium on Computer architecture
Wattch: a framework for architectural-level power analysis and optimizations

Proceedings of the 27th annual international symposium on Computer architecture
Profile-driven code execution for low power dissipation (poster session)

ISLPED '00 Proceedings of the 2000 international symposium on Low power electronics and design
A circuit level implementation of an adaptive issue queue for power-aware microprocessors

GLSVLSI '01 Proceedings of the 11th Great Lakes symposium on VLSI
Energy-effective issue logic

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Instruction flow-based front-end throttling for power-aware high-performance processors

ISLPED '01 Proceedings of the 2001 international symposium on Low power electronics and design
The optimal logic depth per pipeline stage is 6 to 8 FO4 inverter delays

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Saving energy with just in time instruction delivery

Proceedings of the 2002 international symposium on Low power electronics and design
Power-Aware Microarchitecture: Design and Modeling Challenges for Next-Generation Microprocessors

IEEE Micro
Integrating Adaptive On-Chip Storage Structures for Reduced Dynamic Power

Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Power-efficient issue queue design

Power aware computing
Thermal Management System for High Performance PowerPCTM Microprocessors

COMPCON '97 Proceedings of the 42nd IEEE International Computer Conference
POWER4 system microarchitecture

IBM Journal of Research and Development
Early-stage definition of LPX: a low power issue-execute processor

PACS'02 Proceedings of the 2nd international conference on Power-aware computer systems

Comparing Program Phase Detection Techniques

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
IATAC: a smart predictor to turn-off L2 cache lines

ACM Transactions on Architecture and Code Optimization (TACO)
Instruction packing: reducing power and delay of the dynamic scheduling logic

ISLPED '05 Proceedings of the 2005 international symposium on Low power electronics and design
Energy-aware fetch mechanism: trace cache and BTB customization

ISLPED '05 Proceedings of the 2005 international symposium on Low power electronics and design
Look-Ahead Architecture Adaptation to Reduce Processor Power Consumption

IEEE Micro
A performance-conserving approach for reducing peak power consumption in server systems

Proceedings of the 19th annual international conference on Supercomputing
Reducing the Energy of Speculative Instruction Schedulers

ICCD '05 Proceedings of the 2005 International Conference on Computer Design
Power-Efficient Wakeup Tag Broadcast

ICCD '05 Proceedings of the 2005 International Conference on Computer Design
Control Speculation for Energy-Efficient Next-Generation Superscalar Processors

IEEE Transactions on Computers
Instruction packing: Toward fast and energy-efficient instruction scheduling

ACM Transactions on Architecture and Code Optimization (TACO)
Power-efficient instruction delivery through trace reuse

Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Energy-efficient dynamic instruction scheduling logic through instruction grouping

Proceedings of the 2006 international symposium on Low power electronics and design
Exploiting Operand Availability for Efficient Simultaneous Multithreading

IEEE Transactions on Computers
By-passing the out-of-order execution pipeline to increase energy-efficiency

Proceedings of the 4th international conference on Computing frontiers
Multi-optimization power management for chip multiprocessors

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Speculative return address stack management revisited

ACM Transactions on Architecture and Code Optimization (TACO)
Fetch Gating Control through Speculative Instruction Window Weighting

Transactions on High-Performance Embedded Architectures and Compilers II
Reducing peak power with a table-driven adaptive processor core

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Fetch gating control through speculative instruction window weighting

HiPEAC'07 Proceedings of the 2nd international conference on High performance embedded architectures and compilers
Power-efficient, reliable microprocessor architectures: modeling and design methods

Proceedings of the 20th symposium on Great lakes symposium on VLSI
Branch target buffer design for embedded processors

Microprocessors & Microsystems
Reducing delay and power consumption of the wakeup logic through instruction packing and tag memoization

PACS'04 Proceedings of the 4th international conference on Power-Aware Computer Systems
Compiler directed issue queue energy reduction

Transactions on High-Performance Embedded Architectures and Compilers IV
Flicker: a dynamically adaptive architecture for power limited multicore systems

Proceedings of the 40th Annual International Symposium on Computer Architecture
Power management of multi-core chips: challenges and pitfalls

DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe

Quantified Score

Hi-index	0.01

Visualization

Abstract

Front-end instruction delivery accounts for a significant fraction of the energy consumed in a dynamic superscalar processor. The issue queue in these processors serves two crucial roles: it bridges the front and back ends of the processor and serves as the window of instructions for the out-of-order engine. A mismatch between the front end producer rate and back end consumer rate, and between the supplied instruction window from the front end, and the required instruction window to exploit the level of application parallelism, results in additional front-end energy, and increases the issue queue utilization. While the former increases overall processor energy consumption, the latter aggravates the issue queue hot spot problem.We propose a complementary combination of fetch gating and issue queue adaptation to address both of these issues. We introduce an issue-centric fetch gating scheme based on issue queue utilization and application parallelism characteristics. Our scheme attempts to provide an instruction window size that matches the current parallelism characteristics of the application while maintaining enough queue entries to avoid back-end starvation. Compared to a conventional fetch gating scheme based on flow-rate matching, we demonstrate 20% better overall energy-delay with a 44% additional reduction in issue queue energy. We identify Icache energy savings as the largest contributor to the overall savings and quantify the sources of savings in this structure. We then couple this issue-driven fetch gating approach with an issue queue adaptation scheme based on queue utilization. While the fetch gating scheme provides a window of issue queue instructions appropriate to the level of program parallelism, the issue queue adaptation approach shuts down the remaining underutilized issue queue entries. Used in tandem, these complementary techniques yield a 20% greater issue queue energy savings than the addition of the savings from each technique applied in isolation. The result of this combined approach is a 6% overall energy-delay savings coupled with a 54% reduction in issue queue energy.