Optimization power consumption model of reliability-aware GPU clusters

  • Authors:
  • Haifeng Wang;Qingkui Chen

  • Affiliations:
  • School of Management, University of Shanghai for Science and Technology Shanghai, Shanghai, China and Information School, LinYi University, LinYi, China;School of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology, Shanghai, China

  • Venue:
  • The Journal of Supercomputing
  • Year:
  • 2014

Quantified Score

Hi-index 0.00

Visualization

Abstract

Power controlling on reliability-aware GPU clusters with dynamically variable voltage and speed is investigated as combinatorial optimization problem, namely the problem of minimizing task execution time with energy consumption constraint and the problem of minimizing energy consumption with system reliability constraint. The two problems have applied in general multiprocessor computing and real-time multiprocessing systems where energy consumption and system reliability both are important. These problems which emphasize the trade-off among performance, power and reliability have not been well studied before. In this research, a聽novel power control model is built based on Model Prediction Control theory. Maximum Entropy Method is used to determine partial ordering relation of control variable and to identify the quality of solutions. Our controller can cap the redundant energy consumption by dynamically transforming energy states of the nodes in GPU cluster. We compare our controller with the control scheme, which does not consider the system reliability. The experimental results demonstrate that the proposed controller is more reliable and valuable.