Joint optimization of idle and cooling power in data centers while maintaining response time

  • Authors:
  • Faraz Ahmad;T. N. Vijaykumar

  • Affiliations:
  • Purdue University, West Lafayette, IN, USA;Purdue University, West Lafayette, IN, USA

  • Venue:
  • Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Server power and cooling power amount to a significant fraction of modern data centers' recurring costs. While data centers provision enough servers to guarantee response times under the maximum loading, data centers operate under much less loading most of the times (e.g., 30-70% of the maximum loading). Previous server-power proposals exploit this under-utilization to reduce the server idle power by keeping active only as many servers as necessary and putting the rest into low-power standby modes. However, these proposals incur higher cooling power due to hot spots created by concentrating the data center loading on fewer active servers, or degrade response times due to standby-to-active transition delays, or both. Other proposals optimize the cooling power but incur considerable idle power. To address the first issue of power, we propose PowerTrade, which trades-off idle power and cooling power for each other, thereby reducing the total power. To address the second issue of response time, we propose SurgeGuard to overprovision the number of active servers beyond that needed by the current loading so as to absorb future increases in the loading. SurgeGuard is a two-tier scheme which uses well-known over-provisioning at coarse time granularities (e.g., one hour) to absorb the common, smooth increases in the loading, and a novel fine-grain replenishment of the over-provisioned reserves at fine time granularities (e.g., five minutes) to handle the uncommon, abrupt loading surges. Using real-world traces, we show that combining PowerTrade and SurgeGuard reduces total power by 30% compared to previous low-power schemes while maintaining response times within 1.7%.