Model-driven coordinated management of data centers

Authors:
Tridib Mukherjee;Ayan Banerjee;Georgios Varsamopoulos;Sandeep K. S. Gupta
Affiliations:
School of Computing, Informatics, and Decision Systems Engineering, Arizona State University, Tempe, AZ, USA;School of Computing, Informatics, and Decision Systems Engineering, Arizona State University, Tempe, AZ, USA;School of Computing, Informatics, and Decision Systems Engineering, Arizona State University, Tempe, AZ, USA;School of Computing, Informatics, and Decision Systems Engineering, Arizona State University, Tempe, AZ, USA
Venue:
Computer Networks: The International Journal of Computer and Telecommunications Networking
Year:
2010

Citing 15
Cited 2

PHOENIX: A Self Adaptable Monitoring Platform for Cluster Management

Cluster Computing
Power and Energy Management for Server Systems

Computer
Balance of Power: Dynamic Thermal Management for Internet Data Centers

IEEE Internet Computing
On Honey Bees and Dynamic Server Allocation in Internet Hosting Centers

Adaptive Behavior - Animals, Animats, Software Agents, Robots, Adaptive Systems
Managing server energy and operational costs in hosting centers

SIGMETRICS '05 Proceedings of the 2005 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Ensemble-level Power Management for Dense Blade Servers

Proceedings of the 33rd annual international symposium on Computer Architecture
Mercury and freon: temperature emulation and management for server systems

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Autopilot: automatic data center management

ACM SIGOPS Operating Systems Review - Systems work at Microsoft Research
On the Use of Fuzzy Modeling in Virtualized Data Center Management

ICAC '07 Proceedings of the Fourth International Conference on Autonomic Computing
Autonomic power and performance management for computing systems

Cluster Computing
Energy-Efficient Thermal-Aware Task Scheduling for Homogeneous High-Performance Computing Data Centers: A Cyber-Physical Approach

IEEE Transactions on Parallel and Distributed Systems
Measurement-based power profiling of data center equipment

CLUSTER '07 Proceedings of the 2007 IEEE International Conference on Cluster Computing
vManage: loosely coupled platform and virtualization management in data centers

ICAC '09 Proceedings of the 6th international conference on Autonomic computing
Spatio-temporal thermal-aware job scheduling to minimize energy consumption in virtualized heterogeneous data centers

Computer Networks: The International Journal of Computer and Telecommunications Networking
Data center evolution

Computer Networks: The International Journal of Computer and Telecommunications Networking

Collaborative policy-based autonomic management: in a hierarchical model

Proceedings of the 7th International Conference on Network and Services Management
GDCSim: A simulator for green data center design and analysis

ACM Transactions on Modeling and Computer Simulation (TOMACS) - Special issue on simulation in complex service systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Management of computing infrastructure in data centers is an important and challenging problem, that needs to: (i) ensure availability of services conforming to the Service Level Agreements (SLAs); and (ii) reduce the Power Usage Effectiveness (PUE), i.e. the ratio of total power, up to half of which is attributed to data center cooling, over the computing power to service the workloads. The cooling energy consumption can be reduced by allowing higher-than-usual thermostat set temperatures while maintaining the ambient temperature in the data center room within manufacturer-specified server redline temperatures for their reliable operations. This paper proposes: (i) a Coordinated Job, Power, and Cooling Management (JPCM) policy, which performs: (a) job management so as to allow for an increase in the thermostat setting of the cooling unit while meeting the SLA requirements, (b) power management to reduce the produced thermal load, and (c) cooling management to dynamically adjust the thermostat setting; and (ii) a Model-driven coordinated Management Architecture (MMA), which uses a state-based model to dynamically decide the correct management policy to handle events, such as new workload arrival or failure of a cooling unit, that can trigger an increase in the ambient temperature. Each event is associated with a time window, referred to as the window-of-opportunity, after which the temperature at the inlet of one or more servers can go beyond the redline temperature if proper management policies are not enforced. This window-of-opportunity monotonically decreases with increase in the incoming workload. The selection of the management policy depends on their potential energy benefits and the conformance of the delays in their actuation to the window-of-opportunity. Simulations based on actual job traces from the ASU HPC data center show that the JPCM can achieve up to 18% energy-savings over separated power or job management policies. However, high delay to reach a stable ambient temperature (in case of cooling management through dynamic thermostat setting) can violate the server redline temperatures. A management decision chart is developed as part of MMA to autonomically employ the management policy with maximum energy-savings without violating the window-of-opportunity, and hence the redline temperatures. Further, a prototype of the JPCM is developed by configuring the widely used Moab cluster manager to dynamically change the server priorities for job assignment.