Energy cost evaluation of parallel algorithms for multiprocessor systems

Authors:
Zhuowei Wang;Xianbin Xu;Naixue Xiong;Laurence T. Yang;Wuqing Zhao
Affiliations:
School of Computer, Wuhan University, Wuhan, China 430000;School of Computer, Wuhan University, Wuhan, China 430000;Department of Computer Science, Georgia State University, Atlanta, USA;Department of Computer Science, St. Francis Xavier University, Antigonish, Canada;School of Computer, Wuhan University, Wuhan, China 430000
Venue:
Cluster Computing
Year:
2013

Citing 22
Cited 1

The input/output complexity of sorting and related problems

Communications of the ACM
Scan primitives for vector computers

Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Automatic Data Structure Selection and Transformation for Sparse Matrix Computations

IEEE Transactions on Parallel and Distributed Systems
Design issues for dynamic voltage scaling

ISLPED '00 Proceedings of the 2000 international symposium on Low power electronics and design
Segmented Operations for Sparse Matrix Computation on Vector Multiprocessors

Segmented Operations for Sparse Matrix Computation on Vector Multiprocessors
Sparse matrix solvers on the GPU: conjugate gradients and multigrid

ACM SIGGRAPH 2003 Papers
Understanding the efficiency of GPU algorithms for matrix-matrix multiplication

Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware
Concurrent cache-oblivious b-trees

Proceedings of the seventeenth annual ACM symposium on Parallelism in algorithms and architectures
Exploring Graphics Processor Performance for General Purpose Applications

DSD '05 Proceedings of the 8th Euromicro Conference on Digital System Design
A memory model for scientific algorithms on graphics processors

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Performance-Energy Tradeoffs for Matrix Multiplication on FPGA-Based Mixed-Mode Chip Multiprocessors

ISQED '07 Proceedings of the 8th International Symposium on Quality Electronic Design
Scan primitives for GPU computing

Proceedings of the 22nd ACM SIGGRAPH/EUROGRAPHICS symposium on Graphics hardware
Optimising data movement rates for parallel processing applications on graphics processors

PDCN'07 Proceedings of the 25th conference on Proceedings of the 25th IASTED International Multi-Conference: parallel and distributed computing and networks
Exploring weak scalability for FEM calculations on a GPU-enhanced cluster

Parallel Computing
Studying Thermal Management for Graphics-Processor Architectures

ISPASS '05 Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2005
Efficient gather and scatter operations on graphics processors

Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Fast scan algorithms on graphics processors

Proceedings of the 22nd annual international conference on Supercomputing
On the energy efficiency of graphics processing units for scientific computing

IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
Analysis of Parallel Algorithms for Energy Conservation in Scalable Multicore Architectures

ICPP '09 Proceedings of the 2009 International Conference on Parallel Processing
An adaptive performance modeling tool for GPU architectures

Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Towards optimizing energy costs of algorithms for shared memory architectures

Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
Energy-aware high performance computing with graphic processing units

HotPower'08 Proceedings of the 2008 conference on Power aware computing and systems

Bitonic sort on a chained-cubic tree interconnection network

Journal of Parallel and Distributed Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

With the continuous development of hardware and software, Graphics Processor Units (GPUs) have been used in the general-purpose computation field. They have emerged as a computational accelerator that dramatically reduces the application execution time with CPUs. To achieve high computing performance, a GPU typically includes hundreds of computing units. The high density of computing resource on a chip brings in high power consumption. Therefore power consumption has become one of the most important problems for the development of GPUs. This paper analyzes the energy consumption of parallel algorithms executed in GPUs and provides a method to evaluate the energy scalability for parallel algorithms. Then the parallel prefix sum is analyzed to illustrate the method for the energy conservation, and the energy scalability is experimentally evaluated using Sparse Matrix-Vector Multiply (SpMV). The results show that the optimal number of blocks, memory choice and task scheduling are the important keys to balance the performance and the energy consumption of GPUs.