Algorithm level power efficiency optimization for CPU-GPU processing element in data intensive SIMD/SPMD computing

Authors:
Da Qi Ren
Affiliations:
-
Venue:
Journal of Parallel and Distributed Computing
Year:
2011

Citing 6
Cited 2

Digital integrated circuits: a design perspective

Digital integrated circuits: a design perspective
Efficient RTL Power Estimation for Large Designs

VLSID '03 Proceedings of the 16th International Conference on VLSI Design
Runtime Power Monitoring in High-End Processors: Methodology and Empirical Data

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Adagio: making DVS practical for complex HPC applications

Proceedings of the 23rd international conference on Supercomputing
Power Efficient Large Matrices Multiplication by Load Scheduling on Multi-core and GPU Platform with CUDA

CSE '09 Proceedings of the 2009 International Conference on Computational Science and Engineering - Volume 01
An integrated GPU power and performance model

Proceedings of the 37th annual international symposium on Computer architecture

Global optimization model on power efficiency of GPU and multicore processing element for SIMD computing with CUDA

Computer Science - Research and Development
Accelerated implementation of adaptive directional lifting-based discrete wavelet transform on GPU

Image Communication

Quantified Score

Hi-index	0.00

Visualization

Abstract

Power efficiency investigation has been required in each level of a High Performance Computing (HPC) system because of the increasing computation demands of scientific and engineering applications. Focusing on handling the critical design constraints in the software level that run beyond a parallel system composed of huge numbers of power-hungry components, we optimize HPC program design in order to achieve the best possible power performance on the target hardware platform. The power performance of a CUDA Processing Element (PE) is determined by both hardware factors including power features of each component including with CPU, GPU, main memory and PCI buses, and their interconnection architecture; and software factors including algorithm design and the character of executable instructions performed on it. In this paper, approaches to model and evaluate the power consumption of large scale SIMD computation by CUDA PEs on multi-core and GPU platforms are introduced. The model allows obtaining design characteristic values at the early programming stage, thus benefitting programmers by providing the necessary environment information for choosing the best power-efficient alternative. Based on the model, CPU Dynamic frequency scaling (DFS) can be applied on CUDA PE architecture that adjusts CPU frequency to enhance power efficiency of the entire PE without compromising its computing performance. The power model and power efficiency improvements of the new designs have been validated by measuring the new programs on the real GPU multiprocessing system.