Cache design trade-offs for power and performance optimization: a case study
ISLPED '95 Proceedings of the 1995 international symposium on Low power design
Complexity-effective superscalar processors
Proceedings of the 24th annual international symposium on Computer architecture
The filter cache: an energy efficient memory structure
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Power considerations in the design of the Alpha 21264 microprocessor
DAC '98 Proceedings of the 35th annual Design Automation Conference
Pipeline gating: speculation control for energy reduction
Proceedings of the 25th annual international symposium on Computer architecture
Using dynamic cache management techniques to reduce energy in a high-performance processor
ISLPED '99 Proceedings of the 1999 international symposium on Low power electronics and design
Selective cache ways: on-demand cache resource allocation
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Wattch: a framework for architectural-level power analysis and optimizations
Proceedings of the 27th annual international symposium on Computer architecture
ACM Transactions on Computer Systems (TOCS)
Power and energy reduction via pipeline balancing
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Reducing set-associative cache energy via way-prediction and selective direct-mapping
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Dynamically Exploiting Narrow Width Operands to Improve Processor Power and Performance
HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
Power optimization using dynamic power management
SBCCI'99 Proceedings of the XIIth conference on Integrated circuits and systems design
Proceedings of the 2003 international symposium on Low power electronics and design
Proceedings of the 2003 international symposium on Low power electronics and design
VSV: L2-Miss-Driven Variable Supply-Voltage Scaling for Low Power
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 2004 Asia and South Pacific Design Automation Conference
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Low power network processor design using clock gating
Proceedings of the 42nd annual Design Automation Conference
Cascaded carry-select adder (C2SA): a new structure for low-power CSA design
ISLPED '05 Proceedings of the 2005 international symposium on Low power electronics and design
Power and thermal effects of SRAM vs. Latch-Mux design styles and clock gating choices
ISLPED '05 Proceedings of the 2005 international symposium on Low power electronics and design
Combined circuit and architectural level variable supply-voltage scaling for low power
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
ASP-DAC '06 Proceedings of the 2006 Asia and South Pacific Design Automation Conference
Efficient early stage resonance estimation techniques for C4 package
ASP-DAC '06 Proceedings of the 2006 Asia and South Pacific Design Automation Conference
Automatic ADL-based operand isolation for embedded processors
Proceedings of the conference on Design, automation and test in Europe: Proceedings
Conserving network processor power consumption by exploiting traffic variability
ACM Transactions on Architecture and Code Optimization (TACO)
Variable-latency adder (VL-adder): new arithmetic circuit design practice to overcome NBTI
ISLPED '07 Proceedings of the 2007 international symposium on Low power electronics and design
Finding the worst voltage violation in multi-domain clock gated power network
Proceedings of the conference on Design, automation and test in Europe
Predicting the worst-case voltage violation in a 3D power network
Proceedings of the 11th international workshop on System level interconnect prediction
Efficient power network analysis considering multidomain clock gating
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Logic synthesis for low power using clock gating and rewiring
Proceedings of the 20th symposium on Great lakes symposium on VLSI
Implementation of an UWB impulse-radio acquisition and despreading algorithm on a low power ASIP
HiPEAC'08 Proceedings of the 3rd international conference on High performance embedded architectures and compilers
Enhanced power management scheme for low-power UWB communications
ISCIT'09 Proceedings of the 9th international conference on Communications and information technologies
ICCD'09 Proceedings of the 2009 IEEE international conference on Computer design
On applying erroneous clock gating conditions to further cut down power
Proceedings of the 16th Asia and South Pacific Design Automation Conference
Exploiting dynamic micro-architecture usage in gate sizing
Microprocessors & Microsystems
Energy management for embedded multithreaded processors with integrated EDF scheduling
ARCS'05 Proceedings of the 18th international conference on Architecture of Computing Systems conference on Systems Aspects in Organic and Pervasive Computing
GPUWattch: enabling energy optimizations in GPGPUs
Proceedings of the 40th Annual International Symposium on Computer Architecture
Hi-index | 0.00 |
With the scaling of technology and the need for higher performance and more functionality, power dissipation is becoming a major bottleneck for microprocessor designs. Pipeline balancing (PLB), a previous technique, is essentially a methodology to clock-gate unused components whenever a program's instruction-level parallelism is predicted to be low. However, no non-predictive methodologies are available in the literature for efficient clock gating. This paper introduces deterministic clock gating (DCG) based on the key observation that for many of the stages in a modern pipeline, a circuit block's usage in a specific cycle in the near future is deterministically known a few cycles ahead of time. Our experiments show an average of 19.9% reduction in processor power with virtually no performance loss for an 8-issue, out-of-order superscalar processor by applying DCG to execution units, pipeline latches, D-Cache wordline decoders, and result bus drivers. In contrast, PLB achieves 9.9% average power savings at 2.9% performance loss.