Wattch: a framework for architectural-level power analysis and optimizations
Proceedings of the 27th annual international symposium on Computer architecture
A static power model for architects
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Design Challenges of Technology Scaling
IEEE Micro
Full chip leakage estimation considering power supply and temperature variations
Proceedings of the 2003 international symposium on Low power electronics and design
Runtime Power Monitoring in High-End Processors: Methodology and Empirical Data
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Temperature-aware microarchitecture: Modeling and implementation
ACM Transactions on Architecture and Code Optimization (TACO)
A flexible simulation framework for graphics architectures
Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware
Power prediction for intel XScale® processors using performance monitoring unit events
ISLPED '05 Proceedings of the 2005 international symposium on Low power electronics and design
Power-performance considerations of parallel computing on chip multiprocessors
ACM Transactions on Architecture and Code Optimization (TACO)
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Merge: a programming model for heterogeneous multi-core systems
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Roofline: an insightful visual performance model for multicore architectures
Communications of the ACM - A Direct Path to Dependable Software
An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness
Proceedings of the 36th annual international symposium on Computer architecture
On the energy efficiency of graphics processing units for scientific computing
IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
A characterization and analysis of PTX kernels
IISWC '09 Proceedings of the 2009 IEEE International Symposium on Workload Characterization (IISWC)
Journal of Parallel and Distributed Computing
Kernel Fusion: An Effective Method for Better Power Efficiency on Multithreaded GPU
GREENCOM-CPSCOM '10 Proceedings of the 2010 IEEE/ACM Int'l Conference on Green Computing and Communications & Int'l Conference on Cyber, Physical and Social Computing
A framework for dynamically instrumenting GPU compute applications within GPU Ocelot
Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units
Energy-efficient mechanisms for managing thread context in throughput processors
Proceedings of the 38th annual international symposium on Computer architecture
Bounding the effect of partition camping in GPU kernels
Proceedings of the 8th ACM International Conference on Computing Frontiers
Power gating strategies on GPUs
ACM Transactions on Architecture and Code Optimization (TACO)
CuMAPz: a tool to analyze memory access patterns in CUDA
Proceedings of the 48th Design Automation Conference
A compile-time managed multi-level register file hierarchy
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Efficient on-line module-level wake-up scheduling for high performance multi-module designs
Proceedings of the 2012 ACM international symposium on International Symposium on Physical Design
A Hierarchical Thread Scheduler and Register File for Energy-Efficient Throughput Processors
ACM Transactions on Computer Systems (TOCS)
BSArc: blacksmith streaming architecture for HPC accelerators
Proceedings of the 9th conference on Computing Frontiers
Proceedings of the 9th conference on Computing Frontiers
Parameterized micro-benchmarking: an auto-tuning approach for complex applications
Proceedings of the 9th conference on Computing Frontiers
Boosting mobile GPU performance with a decoupled access/execute fragment processor
Proceedings of the 39th Annual International Symposium on Computer Architecture
Lane decoupling for improving the timing-error resiliency of wide-SIMD architectures
Proceedings of the 39th Annual International Symposium on Computer Architecture
Power Modeling and Characterization of Computing Devices: A Survey
Foundations and Trends in Electronic Design Automation
Workload and power budget partitioning for single-chip heterogeneous processors
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Power-efficient computing for compute-intensive GPGPU applications
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Power and performance analysis of GPU-accelerated systems
HotPower'12 Proceedings of the 2012 USENIX conference on Power-Aware Computing and Systems
Computer Science - Research and Development
Energy consumption modeling for hybrid computing
Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
Power efficiency evaluation of block ciphers on GPU-integrated multicore processor
ICA3PP'12 Proceedings of the 12th international conference on Algorithms and Architectures for Parallel Processing - Volume Part I
A survey and taxonomy of on-chip monitoring of multicore systems-on-chip
ACM Transactions on Design Automation of Electronic Systems (TODAES)
CAP: co-scheduling based on asymptotic profiling in CPU+GPU hybrid systems
Proceedings of the 2013 International Workshop on Programming Models and Applications for Multicores and Manycores
Power and Performance Management of GPUs Based Cluster
International Journal of Cloud Applications and Computing
Inter-warp instruction temporal locality in deep-multithreaded GPUs
ARCS'13 Proceedings of the 26th international conference on Architecture of Computing Systems
Warped-DMR: Light-weight Error Detection for GPGPU
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Cooperative boosting: needy versus greedy power management
Proceedings of the 40th Annual International Symposium on Computer Architecture
GPUWattch: enabling energy optimizations in GPGPUs
Proceedings of the 40th Annual International Symposium on Computer Architecture
Temperature aware thread block scheduling in GPGPUs
Proceedings of the 50th Annual Design Automation Conference
Coordinated energy management in heterogeneous processors
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Memory performance estimation of CUDA programs
ACM Transactions on Embedded Computing Systems (TECS) - Special issue on application-specific processors
APOGEE: adaptive prefetching on GPUs for energy efficiency
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Exploring hybrid memory for GPU energy efficiency through software-hardware co-design
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Starchart: hardware and software optimization using recursive partitioning regression trees
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Evaluating integrated graphics processors for data center workloads
Proceedings of the Workshop on Power-Aware Computing and Systems
A measurement study of GPU DVFS on energy conservation
Proceedings of the Workshop on Power-Aware Computing and Systems
High-Resolution power profiling of GPU functions using low-resolution measurement
Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing
Energy-aware code motion for GPU shader processors
ACM Transactions on Embedded Computing Systems (TECS)
Exploiting GPU peak-power and performance tradeoffs through reduced effective pipeline latency
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Optimization power consumption model of reliability-aware GPU clusters
The Journal of Supercomputing
Analytical modeling of energy efficiency in heterogeneous processors
Computers and Electrical Engineering
Measuring GPU Power with the K20 Built-in Sensor
Proceedings of Workshop on General Purpose Processing Using GPUs
Power Modeling for Heterogeneous Processors
Proceedings of Workshop on General Purpose Processing Using GPUs
CPU+GPU scheduling with asymptotic profiling
Parallel Computing
Hi-index | 0.00 |
GPU architectures are increasingly important in the multi-core era due to their high number of parallel processors. Performance optimization for multi-core processors has been a challenge for programmers. Furthermore, optimizing for power consumption is even more difficult. Unfortunately, as a result of the high number of processors, the power consumption of many-core processors such as GPUs has increased significantly. Hence, in this paper, we propose an integrated power and performance (IPP) prediction model for a GPU architecture to predict the optimal number of active processors for a given application. The basic intuition is that when an application reaches the peak memory bandwidth, using more cores does not result in performance improvement. We develop an empirical power model for the GPU. Unlike most previous models, which require measured execution times, hardware performance counters, or architectural simulations, IPP predicts execution times to calculate dynamic power events. We then use the outcome of IPP to control the number of running cores. We also model the increases in power consumption that resulted from the increases in temperature. With the predicted optimal number of active cores, we show that we can save up to 22.09%of runtime GPU energy consumption and on average 10.99% of that for the five memory bandwidth-limited benchmarks.