Power gating strategies on GPUs

Authors:
Po-Han Wang;Chia-Lin Yang;Yen-Ming Chen;Yu-Jung Cheng
Affiliations:
National Taiwan University, Taipei, Taiwan (R.O.C.);National Taiwan University, Taipei, Taiwan (R.O.C.);National Taiwan University, Taipei, Taiwan (R.O.C.);National Taiwan University, Taipei, Taiwan (R.O.C.)
Venue:
ACM Transactions on Architecture and Code Optimization (TACO)
Year:
2011

Citing 28
Cited 3

Architectural implications of hardware-accelerated bucket rendering on the PC

HWWS '97 Proceedings of the ACM SIGGRAPH/EUROGRAPHICS workshop on Graphics hardware
Simple models of the impact of overlap in bucket rendering

HWWS '98 Proceedings of the ACM SIGGRAPH/EUROGRAPHICS workshop on Graphics hardware
Prefetching in a texture cache architecture

HWWS '98 Proceedings of the ACM SIGGRAPH/EUROGRAPHICS workshop on Graphics hardware
Gated-Vdd: a circuit technique to reduce leakage in deep-submicron cache memories

ISLPED '00 Proceedings of the 2000 international symposium on Low power electronics and design
Cache decay: exploiting generational behavior to reduce cache leakage power

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Drowsy caches: simple techniques for reducing leakage power

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Design of High-Performance Microprocessor Circuits

Design of High-Performance Microprocessor Circuits
Comparing System-Level Power Management Policies

IEEE Design & Test
Design Challenges of Technology Scaling

IEEE Micro
Evaluating Run-Time Techniques for Leakage Power Reduction

ASP-DAC '02 Proceedings of the 2002 Asia and South Pacific Design Automation Conference
Graphics for the masses: a hardware rasterization architecture for mobile phones

ACM SIGGRAPH 2003 Papers
GPU Gems: Programming Techniques, Tips and Tricks for Real-Time Graphics

GPU Gems: Programming Techniques, Tips and Tricks for Real-Time Graphics
Implementing branch-predictor decay using quasi-static memory cells

ACM Transactions on Architecture and Code Optimization (TACO)
Microarchitectural techniques for power gating of execution units

Proceedings of the 2004 international symposium on Low power electronics and design
Scene Management Models and Overlap Tests for Tile-Based Rendering

DSD '04 Proceedings of the Digital System Design, EUROMICRO Systems
A flexible simulation framework for graphics architectures

Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware
Power analysis of mobile 3D graphics

Proceedings of the conference on Design, automation and test in Europe: Proceedings
Signature-based workload estimation for mobile 3D graphics

Proceedings of the 43rd annual Design Automation Conference
Games are up for DVFS

Proceedings of the 43rd annual Design Automation Conference
Dynamic Standby Prediction for Leakage Tolerant Microprocessor Functional Units

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Fast triangle reordering for vertex locality and reduced overdraw

ACM SIGGRAPH 2007 papers
A low-power handheld GPU using logarithmic arithmetic and triple DVFS power domains

Proceedings of the 22nd ACM SIGGRAPH/EUROGRAPHICS symposium on Graphics hardware
Studying Thermal Management for Graphics-Processor Architectures

ISPASS '05 Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2005
Control theory-based DVS for interactive 3D games

Proceedings of the 45th annual Design Automation Conference
A Hybrid DVS Scheme for Interactive 3D Games

RTAS '08 Proceedings of the 2008 IEEE Real-Time and Embedded Technology and Applications Symposium
A Predictive Shutdown Technique for GPU Shader Processors

IEEE Computer Architecture Letters
An integrated GPU power and performance model

Proceedings of the 37th annual international symposium on Computer architecture
Distance-based recent use (DRU): an enhancement to instruction cache replacement policies for transition energy reduction

IEEE Transactions on Very Large Scale Integration (VLSI) Systems

Power efficiency for software algorithms running on graphics processors

EGGH-HPG'12 Proceedings of the Fourth ACM SIGGRAPH / Eurographics conference on High-Performance Graphics
A survey and taxonomy of on-chip monitoring of multicore systems-on-chip

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Warped gates: gating aware scheduling and power gating for GPGPUs

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture

Quantified Score

Hi-index	0.00

Visualization

Abstract

As technology continues to shrink, reducing leakage is critical to achieving energy efficiency. Previous studies on low-power GPUs (Graphics Processing Units) focused on techniques for dynamic power reduction, such as DVFS (Dynamic Voltage and Frequency Scaling) and clock gating. In this paper, we explore the potential of adopting architecture-level power gating techniques for leakage reduction on GPUs. We propose three strategies for applying power gating on different modules in GPUs. The Predictive Shader Shutdown technique exploits workload variation across frames to eliminate leakage in shader clusters. Deferred Geometry Pipeline seeks to minimize leakage in fixed-function geometry units by utilizing an imbalance between geometry and fragment computation across batches. Finally, the simple time-out power gating method is applied to nonshader execution units to exploit a finer granularity of the idle time. Our results indicate that Predictive Shader Shutdown eliminates up to 60% of the leakage in shader clusters, Deferred Geometry Pipeline removes up to 57% of the leakage in the fixed-function geometry units, and the simple time-out power gating mechanism eliminates 83.3% of the leakage in nonshader execution units on average. All three schemes incur negligible performance degradation, less than 1%.