An integrated GPU power and performance model

Authors:
Sunpyo Hong;Hyesoon Kim
Affiliations:
Georgia Institute of Technology, Atlanta, GA, USA;Georgia Institute of Technology, Atlanta, GA, USA
Venue:
Proceedings of the 37th annual international symposium on Computer architecture
Year:
2010

Citing 15
Cited 45

Wattch: a framework for architectural-level power analysis and optimizations

Proceedings of the 27th annual international symposium on Computer architecture
A static power model for architects

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Design Challenges of Technology Scaling

IEEE Micro
Full chip leakage estimation considering power supply and temperature variations

Proceedings of the 2003 international symposium on Low power electronics and design
Runtime Power Monitoring in High-End Processors: Methodology and Empirical Data

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Temperature-aware microarchitecture: Modeling and implementation

ACM Transactions on Architecture and Code Optimization (TACO)
A flexible simulation framework for graphics architectures

Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware
Power prediction for intel XScale® processors using performance monitoring unit events

ISLPED '05 Proceedings of the 2005 international symposium on Low power electronics and design
Power-performance considerations of parallel computing on chip multiprocessors

ACM Transactions on Architecture and Code Optimization (TACO)
Feedback-driven threading: power-efficient and high-performance execution of multi-threaded workloads on CMPs

Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Merge: a programming model for heterogeneous multi-core systems

Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Roofline: an insightful visual performance model for multicore architectures

Communications of the ACM - A Direct Path to Dependable Software
An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness

Proceedings of the 36th annual international symposium on Computer architecture
On the energy efficiency of graphics processing units for scientific computing

IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
A characterization and analysis of PTX kernels

IISWC '09 Proceedings of the 2009 IEEE International Symposium on Workload Characterization (IISWC)

Algorithm level power efficiency optimization for CPU-GPU processing element in data intensive SIMD/SPMD computing

Journal of Parallel and Distributed Computing
Kernel Fusion: An Effective Method for Better Power Efficiency on Multithreaded GPU

GREENCOM-CPSCOM '10 Proceedings of the 2010 IEEE/ACM Int'l Conference on Green Computing and Communications & Int'l Conference on Cyber, Physical and Social Computing
A framework for dynamically instrumenting GPU compute applications within GPU Ocelot

Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units
Energy-efficient mechanisms for managing thread context in throughput processors

Proceedings of the 38th annual international symposium on Computer architecture
Bounding the effect of partition camping in GPU kernels

Proceedings of the 8th ACM International Conference on Computing Frontiers
Power gating strategies on GPUs

ACM Transactions on Architecture and Code Optimization (TACO)
CuMAPz: a tool to analyze memory access patterns in CUDA

Proceedings of the 48th Design Automation Conference
A compile-time managed multi-level register file hierarchy

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Efficient on-line module-level wake-up scheduling for high performance multi-module designs

Proceedings of the 2012 ACM international symposium on International Symposium on Physical Design
A Hierarchical Thread Scheduler and Register File for Energy-Efficient Throughput Processors

ACM Transactions on Computer Systems (TOCS)
BSArc: blacksmith streaming architecture for HPC accelerators

Proceedings of the 9th conference on Computing Frontiers
The boat hull model: enabling performance prediction for parallel computing prior to code development

Proceedings of the 9th conference on Computing Frontiers
Parameterized micro-benchmarking: an auto-tuning approach for complex applications

Proceedings of the 9th conference on Computing Frontiers
Boosting mobile GPU performance with a decoupled access/execute fragment processor

Proceedings of the 39th Annual International Symposium on Computer Architecture
Lane decoupling for improving the timing-error resiliency of wide-SIMD architectures

Proceedings of the 39th Annual International Symposium on Computer Architecture
Power Modeling and Characterization of Computing Devices: A Survey

Foundations and Trends in Electronic Design Automation
Workload and power budget partitioning for single-chip heterogeneous processors

Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Power-efficient computing for compute-intensive GPGPU applications

Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Power and performance analysis of GPU-accelerated systems

HotPower'12 Proceedings of the 2012 USENIX conference on Power-Aware Computing and Systems
Global optimization model on power efficiency of GPU and multicore processing element for SIMD computing with CUDA

Computer Science - Research and Development
Energy consumption modeling for hybrid computing

Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
Power efficiency evaluation of block ciphers on GPU-integrated multicore processor

ICA3PP'12 Proceedings of the 12th international conference on Algorithms and Architectures for Parallel Processing - Volume Part I
A survey and taxonomy of on-chip monitoring of multicore systems-on-chip

ACM Transactions on Design Automation of Electronic Systems (TODAES)
CAP: co-scheduling based on asymptotic profiling in CPU+GPU hybrid systems

Proceedings of the 2013 International Workshop on Programming Models and Applications for Multicores and Manycores
Power and Performance Management of GPUs Based Cluster

International Journal of Cloud Applications and Computing
Inter-warp instruction temporal locality in deep-multithreaded GPUs

ARCS'13 Proceedings of the 26th international conference on Architecture of Computing Systems
Warped-DMR: Light-weight Error Detection for GPGPU

MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Cooperative boosting: needy versus greedy power management

Proceedings of the 40th Annual International Symposium on Computer Architecture
GPUWattch: enabling energy optimizations in GPGPUs

Proceedings of the 40th Annual International Symposium on Computer Architecture
Temperature aware thread block scheduling in GPGPUs

Proceedings of the 50th Annual Design Automation Conference
Coordinated energy management in heterogeneous processors

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Memory performance estimation of CUDA programs

ACM Transactions on Embedded Computing Systems (TECS) - Special issue on application-specific processors
APOGEE: adaptive prefetching on GPUs for energy efficiency

PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Exploring hybrid memory for GPU energy efficiency through software-hardware co-design

PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Starchart: hardware and software optimization using recursive partitioning regression trees

PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Evaluating integrated graphics processors for data center workloads

Proceedings of the Workshop on Power-Aware Computing and Systems
A measurement study of GPU DVFS on energy conservation

Proceedings of the Workshop on Power-Aware Computing and Systems
High-Resolution power profiling of GPU functions using low-resolution measurement

Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing
Energy-aware code motion for GPU shader processors

ACM Transactions on Embedded Computing Systems (TECS)
Exploiting GPU peak-power and performance tradeoffs through reduced effective pipeline latency

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Optimization power consumption model of reliability-aware GPU clusters

The Journal of Supercomputing
Analytical modeling of energy efficiency in heterogeneous processors

Computers and Electrical Engineering
Measuring GPU Power with the K20 Built-in Sensor

Proceedings of Workshop on General Purpose Processing Using GPUs
Power Modeling for Heterogeneous Processors

Proceedings of Workshop on General Purpose Processing Using GPUs
CPU+GPU scheduling with asymptotic profiling

Parallel Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

GPU architectures are increasingly important in the multi-core era due to their high number of parallel processors. Performance optimization for multi-core processors has been a challenge for programmers. Furthermore, optimizing for power consumption is even more difficult. Unfortunately, as a result of the high number of processors, the power consumption of many-core processors such as GPUs has increased significantly. Hence, in this paper, we propose an integrated power and performance (IPP) prediction model for a GPU architecture to predict the optimal number of active processors for a given application. The basic intuition is that when an application reaches the peak memory bandwidth, using more cores does not result in performance improvement. We develop an empirical power model for the GPU. Unlike most previous models, which require measured execution times, hardware performance counters, or architectural simulations, IPP predicts execution times to calculate dynamic power events. We then use the outcome of IPP to control the number of running cores. We also model the increases in power consumption that resulted from the increases in temperature. With the predicted optimal number of active cores, we show that we can save up to 22.09%of runtime GPU energy consumption and on average 10.99% of that for the five memory bandwidth-limited benchmarks.