Statistical profiling-based techniques for effective power provisioning in data centers

  • Authors:
  • Sriram Govindan;Jeonghwan Choi;Bhuvan Urgaonkar;Anand Sivasubramaniam;Andrea Baldini

  • Affiliations:
  • The Pennsylvania State University, State college, PA, USA;The Pennsylvania State University, State college, PA, USA;The Pennsylvania State University, State college, PA, USA;The Pennsylvania State University, Tata Consultancy Services, State college, PA, USA;CISCO System, Inc., San Francisco, USA

  • Venue:
  • Proceedings of the 4th ACM European conference on Computer systems
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Current capacity planning practices based on heavy over-provisioning of power infrastructure hurt (i) the operational costs of data centers as well as (ii) the computational work they can support. We explore a combination of statistical multiplexing techniques to improve the utilization of the power hierarchy within a data center. At the highest level of the power hierarchy, we employ controlled underprovisioning and over-booking of power needs of hosted workloads. At the lower levels, we introduce the novel notion of soft fuses to flexibly distribute provisioned power among hosted workloads based on their needs. Our techniques are built upon a measurement-driven profiling and prediction framework to characterize key statistical properties of the power needs of hosted workloads and their aggregates. We characterize the gains in terms of the amount of computational work (CPU cycles) per provisioned unit of power Computation per Provisioned Watt (CPW). Our technique is able to double the CPWoffered by a Power Distribution Unit (PDU) running the e-commerce benchmark TPC-W compared to conventional provisioning practices. Over-booking the PDU by 10% based on tails of power profiles yields a further improvement of 20%. Reactive techniques implemented on our Xen VMM-based servers dynamically modulate CPU DVFS states to ensure power draw below the limits imposed by soft fuses. Finally, information captured in our profiles also provide ways of controlling application performance degradation despite overbooking. The 95th percentile of TPC-W session response time only grew from 1.59 sec to 1.78 sec--a degradation of 12%.