Assigning confidence to conditional branch predictions
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Pipeline gating: speculation control for energy reduction
Proceedings of the 25th annual international symposium on Computer architecture
Dynamic IPC/clock rate optimization
Proceedings of the 25th annual international symposium on Computer architecture
Wattch: a framework for architectural-level power analysis and optimizations
Proceedings of the 27th annual international symposium on Computer architecture
Profile-driven code execution for low power dissipation (poster session)
ISLPED '00 Proceedings of the 2000 international symposium on Low power electronics and design
A circuit level implementation of an adaptive issue queue for power-aware microprocessors
GLSVLSI '01 Proceedings of the 11th Great Lakes symposium on VLSI
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Instruction flow-based front-end throttling for power-aware high-performance processors
ISLPED '01 Proceedings of the 2001 international symposium on Low power electronics and design
The optimal logic depth per pipeline stage is 6 to 8 FO4 inverter delays
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Saving energy with just in time instruction delivery
Proceedings of the 2002 international symposium on Low power electronics and design
Integrating Adaptive On-Chip Storage Structures for Reduced Dynamic Power
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Power-efficient issue queue design
Power aware computing
Thermal Management System for High Performance PowerPCTM Microprocessors
COMPCON '97 Proceedings of the 42nd IEEE International Computer Conference
POWER4 system microarchitecture
IBM Journal of Research and Development
Early-stage definition of LPX: a low power issue-execute processor
PACS'02 Proceedings of the 2nd international conference on Power-aware computer systems
Comparing Program Phase Detection Techniques
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
IATAC: a smart predictor to turn-off L2 cache lines
ACM Transactions on Architecture and Code Optimization (TACO)
Instruction packing: reducing power and delay of the dynamic scheduling logic
ISLPED '05 Proceedings of the 2005 international symposium on Low power electronics and design
Energy-aware fetch mechanism: trace cache and BTB customization
ISLPED '05 Proceedings of the 2005 international symposium on Low power electronics and design
A performance-conserving approach for reducing peak power consumption in server systems
Proceedings of the 19th annual international conference on Supercomputing
Reducing the Energy of Speculative Instruction Schedulers
ICCD '05 Proceedings of the 2005 International Conference on Computer Design
Power-Efficient Wakeup Tag Broadcast
ICCD '05 Proceedings of the 2005 International Conference on Computer Design
Control Speculation for Energy-Efficient Next-Generation Superscalar Processors
IEEE Transactions on Computers
Instruction packing: Toward fast and energy-efficient instruction scheduling
ACM Transactions on Architecture and Code Optimization (TACO)
Power-efficient instruction delivery through trace reuse
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Energy-efficient dynamic instruction scheduling logic through instruction grouping
Proceedings of the 2006 international symposium on Low power electronics and design
Exploiting Operand Availability for Efficient Simultaneous Multithreading
IEEE Transactions on Computers
By-passing the out-of-order execution pipeline to increase energy-efficiency
Proceedings of the 4th international conference on Computing frontiers
Multi-optimization power management for chip multiprocessors
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Speculative return address stack management revisited
ACM Transactions on Architecture and Code Optimization (TACO)
Fetch Gating Control through Speculative Instruction Window Weighting
Transactions on High-Performance Embedded Architectures and Compilers II
Reducing peak power with a table-driven adaptive processor core
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Fetch gating control through speculative instruction window weighting
HiPEAC'07 Proceedings of the 2nd international conference on High performance embedded architectures and compilers
Power-efficient, reliable microprocessor architectures: modeling and design methods
Proceedings of the 20th symposium on Great lakes symposium on VLSI
Branch target buffer design for embedded processors
Microprocessors & Microsystems
PACS'04 Proceedings of the 4th international conference on Power-Aware Computer Systems
Compiler directed issue queue energy reduction
Transactions on High-Performance Embedded Architectures and Compilers IV
Flicker: a dynamically adaptive architecture for power limited multicore systems
Proceedings of the 40th Annual International Symposium on Computer Architecture
Power management of multi-core chips: challenges and pitfalls
DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
Hi-index | 0.01 |
Front-end instruction delivery accounts for a significant fraction of the energy consumed in a dynamic superscalar processor. The issue queue in these processors serves two crucial roles: it bridges the front and back ends of the processor and serves as the window of instructions for the out-of-order engine. A mismatch between the front end producer rate and back end consumer rate, and between the supplied instruction window from the front end, and the required instruction window to exploit the level of application parallelism, results in additional front-end energy, and increases the issue queue utilization. While the former increases overall processor energy consumption, the latter aggravates the issue queue hot spot problem.We propose a complementary combination of fetch gating and issue queue adaptation to address both of these issues. We introduce an issue-centric fetch gating scheme based on issue queue utilization and application parallelism characteristics. Our scheme attempts to provide an instruction window size that matches the current parallelism characteristics of the application while maintaining enough queue entries to avoid back-end starvation. Compared to a conventional fetch gating scheme based on flow-rate matching, we demonstrate 20% better overall energy-delay with a 44% additional reduction in issue queue energy. We identify Icache energy savings as the largest contributor to the overall savings and quantify the sources of savings in this structure. We then couple this issue-driven fetch gating approach with an issue queue adaptation scheme based on queue utilization. While the fetch gating scheme provides a window of issue queue instructions appropriate to the level of program parallelism, the issue queue adaptation approach shuts down the remaining underutilized issue queue entries. Used in tandem, these complementary techniques yield a 20% greater issue queue energy savings than the addition of the savings from each technique applied in isolation. The result of this combined approach is a 6% overall energy-delay savings coupled with a 54% reduction in issue queue energy.