Cache decay: exploiting generational behavior to reduce cache leakage power
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Power-aware compilation for register file energy reduction
International Journal of Parallel Programming - Special issue: Workshop on application specific processors (WASP)
High-performance CMOS variability in the 65-nm regime and beyond
IBM Journal of Research and Development - Advanced silicon technology
An analytical model for negative bias temperature instability
Proceedings of the 2006 IEEE/ACM international conference on Computer-aided design
An efficient method to identify critical gates under circuit aging
Proceedings of the 2007 IEEE/ACM international conference on Computer-aided design
NBTI-aware power gating for concurrent leakage and aging optimization
Proceedings of the 14th ACM/IEEE international symposium on Low power electronics and design
The impact of NBTI effect on combinational circuit: modeling, simulation, and analysis
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Combating Aging with the Colt Duty Cycle Equalizer
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Analyzing throughput of GPGPUs exploiting within-die core-to-core frequency variation
ISPASS '11 Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software
Impact of Negative-Bias Temperature Instability in Nanoscale SRAM Array: Modeling and Analysis
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Energy-optimal caches with guaranteed lifetime
Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design
TAP: token-based adaptive power gating
Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design
Design of a Tri-Modal Multi-Threshold CMOS Switch With Application to Data Retentive Power Gating
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Multi2Sim: a simulation framework for CPU-GPU computing
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Many-thread aware instruction-level parallelism: architecting shader cores for GPU computing
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
AFReP: application-guided function-level registerfile power-gating for embedded processors
Proceedings of the International Conference on Computer-Aided Design
Unifying Primary Cache, Scratch, and Register File Memories in a Throughput Processor
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Hierarchically focused guardbanding: an adaptive approach to mitigate PVT variations and aging
Proceedings of the Conference on Design, Automation and Test in Europe
Aging-aware compiler-directed VLIW assignment for GPGPU architectures
Proceedings of the 50th Annual Design Automation Conference
Compact degradation sensors for monitoring NBTI and oxide degradation
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Run-time power-gating in caches of GPUs for leakage energy savings
DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
Low power aging-aware register file design by duty cycle balancing
DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
Warped register file: A power efficient register file for GPGPUs
HPCA '13 Proceedings of the 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA)
Hi-index | 0.00 |
State-of-the-art general-purpose graphic processing units (GPGPUs) implemented in nanoscale CMOS technologies offer very high computational throughput for highly-parallel applications using hundreds of integrated on-chip resources. These resources are stressed during application execution, subjecting them to degradation mechanisms such as negative bias temperature instability (NBTI) that adversely affect their reliability. To support highly parallel execution, GPGPUs contain large register files (RFs) that are among the most highly stressed GPGPU components; however we observe heavy underutilization of RFs (on average only 46%) for typical general-purpose kernels. We present ARGO, an Aging-awaRe GPGPU RF allOcator that opportunistically exploits this RF underutilization by distributing the stress throughout RF. ARGO achieves proper leveling of RF banks through deliberated power-gating of stressful banks. We demonstrate our technique on the AMD Evergreen GPGPU architecture and show that ARGO improves the NBTI-induced threshold voltage degradation by up to 43% (on average 27%), that yields improving RFs static noise margin up to 46% (on average 30%). Furthermore, we estimate a simultaneous reduction in leakage power of 54% by providing sleep states for unused banks.