Wattch: a framework for architectural-level power analysis and optimizations
Proceedings of the 27th annual international symposium on Computer architecture
Deterministic Clock Gating for Microprocessor Power Reduction
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Stretching the Limits of Clock-Gating Efficiency in Server-Class Processors
HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
A Dynamic Compilation Framework for Controlling Microprocessor Energy and Performance
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Dynamic Warp Formation and Scheduling for Efficient GPU Control Flow
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Power Consumption of GPUs from a Software Perspective
ICCS '09 Proceedings of the 9th International Conference on Computational Science: Part I
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Rodinia: A benchmark suite for heterogeneous computing
IISWC '09 Proceedings of the 2009 IEEE International Symposium on Workload Characterization (IISWC)
Moving the needle, computer architecture research in academe and industry
Proceedings of the 37th annual international symposium on Computer architecture
An integrated GPU power and performance model
Proceedings of the 37th annual international symposium on Computer architecture
Statistical power modeling of GPU kernels using performance counters
GREENCOMP '10 Proceedings of the International Conference on Green Computing
Understanding the Energy Consumption of Dynamic Random Access Memories
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Thread block compaction for efficient SIMT control flow
HPCA '11 Proceedings of the 2011 IEEE 17th International Symposium on High Performance Computer Architecture
Performance and Power Analysis of ATI GPU: A Statistical Approach
NAS '11 Proceedings of the 2011 IEEE Sixth International Conference on Networking, Architecture, and Storage
CudaDMA: optimizing GPU memory bandwidth via warp specialization
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Improving Throughput of Power-Constrained GPUs Using Dynamic Voltage/Frequency and Core Scaling
PACT '11 Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques
Improving GPU performance via large warps and two-level warp scheduling
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Lossless and lossy memory I/O link compression for improving performance of GPGPU workloads
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Multi2Sim: a simulation framework for CPU-GPU computing
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
SAAHPC '12 Proceedings of the 2012 Symposium on Application Accelerators in High Performance Computing
Convolution engine: balancing efficiency & flexibility in specialized computing
Proceedings of the 40th Annual International Symposium on Computer Architecture
A measurement study of GPU DVFS on energy conservation
Proceedings of the Workshop on Power-Aware Computing and Systems
Exploiting GPU peak-power and performance tradeoffs through reduced effective pipeline latency
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
A locality-aware memory hierarchy for energy-efficient GPU architectures
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Divergence-aware warp scheduling
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Warped gates: gating aware scheduling and power gating for GPGPUs
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Energy efficient GPU transactional memory via space-time optimizations
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of International Workshop on Adaptive Self-tuning Computing Systems
Measuring GPU Power with the K20 Built-in Sensor
Proceedings of Workshop on General Purpose Processing Using GPUs
Power Modeling for Heterogeneous Processors
Proceedings of Workshop on General Purpose Processing Using GPUs
Hi-index | 0.00 |
General-purpose GPUs (GPGPUs) are becoming prevalent in mainstream computing, and performance per watt has emerged as a more crucial evaluation metric than peak performance. As such, GPU architects require robust tools that will enable them to quickly explore new ways to optimize GPGPUs for energy efficiency. We propose a new GPGPU power model that is configurable, capable of cycle-level calculations, and carefully validated against real hardware measurements. To achieve configurability, we use a bottom-up methodology and abstract parameters from the microarchitectural components as the model's inputs. We developed a rigorous suite of 80 microbenchmarks that we use to bound any modeling uncertainties and inaccuracies. The power model is comprehensively validated against measurements of two commercially available GPUs, and the measured error is within 9.9% and 13.4% for the two target GPUs (GTX 480 and Quadro FX5600). The model also accurately tracks the power consumption trend over time. We integrated the power model with the cycle-level simulator GPGPU-Sim and demonstrate the energy savings by utilizing dynamic voltage and frequency scaling (DVFS) and clock gating. Traditional DVFS reduces GPU energy consumption by 14.4% by leveraging within-kernel runtime variations. More finer-grained SM cluster-level DVFS improves the energy savings from 6.6% to 13.6% for those benchmarks that show clustered execution behavior. We also show that clock gating inactive lanes during divergence reduces dynamic power by 11.2%.