Drowsy caches: simple techniques for reducing leakage power
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Managing static leakage energy in microprocessor functional units
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Microarchitectural techniques for power gating of execution units
Proceedings of the 2004 international symposium on Low power electronics and design
Using resource reservation techniques for power-aware scheduling
Proceedings of the 4th ACM international conference on Embedded software
Dynamic power gating with quality guarantees
Proceedings of the 14th ACM/IEEE international symposium on Low power electronics and design
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Rodinia: A benchmark suite for heterogeneous computing
IISWC '09 Proceedings of the 2009 IEEE International Symposium on Workload Characterization (IISWC)
Energy-efficient mechanisms for managing thread context in throughput processors
Proceedings of the 38th annual international symposium on Computer architecture
Proceedings of the 38th annual international symposium on Computer architecture
Power gating strategies on GPUs
ACM Transactions on Architecture and Code Optimization (TACO)
Improving GPU performance via large warps and two-level warp scheduling
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the International Conference on Computer-Aided Design
Warped-DMR: Light-weight Error Detection for GPGPU
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Cache-Conscious Wavefront Scheduling
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
NoRD: Node-Router Decoupling for Effective Power-gating of On-Chip Routers
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Orchestrated scheduling and prefetching for GPGPUs
Proceedings of the 40th Annual International Symposium on Computer Architecture
GPUWattch: enabling energy optimizations in GPGPUs
Proceedings of the 40th Annual International Symposium on Computer Architecture
Warped register file: A power efficient register file for GPGPUs
HPCA '13 Proceedings of the 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA)
Hi-index | 0.00 |
With the widespread adoption of GPGPUs in varied application domains, new opportunities open up to improve GPGPU energy efficiency. Due to inherent application-level inefficiencies, GPGPU execution units experience significant idle time. In this work we propose to power gate idle execution units to eliminate leakage power, which is becoming a significant concern with technology scaling. We show that GPGPU execution units are idle for short windows of time and conventional microprocessor power gating techniques cannot fully exploit these idle windows efficiently due to power gating overhead. Current warp schedulers greedily intersperse integer and floating point instructions, which limit power gating opportunities for any given execution unit type. In order to improve power gating opportunities in GPGPU execution units, we propose a Gating Aware Two-level warp scheduler (GATES) that issues clusters of instructions of the same type before switching to another instruction type. We also propose a new power gating scheme, called Blackout, that forces a power gated execution unit to sleep for at least the break-even time necessary to overcome the power gating overhead before returning to the active state. The combination of GATES and Blackout, which we call Warped Gates, can save 31.6% and 46.5% of integer and floating point unit static energy. The proposed solutions suffer less than 1% performance and area overhead.