PHOENIX: A Self Adaptable Monitoring Platform for Cluster Management
Cluster Computing
Balance of Power: Dynamic Thermal Management for Internet Data Centers
IEEE Internet Computing
On Honey Bees and Dynamic Server Allocation in Internet Hosting Centers
Adaptive Behavior - Animals, Animats, Software Agents, Robots, Adaptive Systems
Managing server energy and operational costs in hosting centers
SIGMETRICS '05 Proceedings of the 2005 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Ensemble-level Power Management for Dense Blade Servers
Proceedings of the 33rd annual international symposium on Computer Architecture
Mercury and freon: temperature emulation and management for server systems
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Autopilot: automatic data center management
ACM SIGOPS Operating Systems Review - Systems work at Microsoft Research
On the Use of Fuzzy Modeling in Virtualized Data Center Management
ICAC '07 Proceedings of the Fourth International Conference on Autonomic Computing
Autonomic power and performance management for computing systems
Cluster Computing
IEEE Transactions on Parallel and Distributed Systems
Measurement-based power profiling of data center equipment
CLUSTER '07 Proceedings of the 2007 IEEE International Conference on Cluster Computing
vManage: loosely coupled platform and virtualization management in data centers
ICAC '09 Proceedings of the 6th international conference on Autonomic computing
Computer Networks: The International Journal of Computer and Telecommunications Networking
Computer Networks: The International Journal of Computer and Telecommunications Networking
Collaborative policy-based autonomic management: in a hierarchical model
Proceedings of the 7th International Conference on Network and Services Management
GDCSim: A simulator for green data center design and analysis
ACM Transactions on Modeling and Computer Simulation (TOMACS) - Special issue on simulation in complex service systems
Hi-index | 0.00 |
Management of computing infrastructure in data centers is an important and challenging problem, that needs to: (i) ensure availability of services conforming to the Service Level Agreements (SLAs); and (ii) reduce the Power Usage Effectiveness (PUE), i.e. the ratio of total power, up to half of which is attributed to data center cooling, over the computing power to service the workloads. The cooling energy consumption can be reduced by allowing higher-than-usual thermostat set temperatures while maintaining the ambient temperature in the data center room within manufacturer-specified server redline temperatures for their reliable operations. This paper proposes: (i) a Coordinated Job, Power, and Cooling Management (JPCM) policy, which performs: (a) job management so as to allow for an increase in the thermostat setting of the cooling unit while meeting the SLA requirements, (b) power management to reduce the produced thermal load, and (c) cooling management to dynamically adjust the thermostat setting; and (ii) a Model-driven coordinated Management Architecture (MMA), which uses a state-based model to dynamically decide the correct management policy to handle events, such as new workload arrival or failure of a cooling unit, that can trigger an increase in the ambient temperature. Each event is associated with a time window, referred to as the window-of-opportunity, after which the temperature at the inlet of one or more servers can go beyond the redline temperature if proper management policies are not enforced. This window-of-opportunity monotonically decreases with increase in the incoming workload. The selection of the management policy depends on their potential energy benefits and the conformance of the delays in their actuation to the window-of-opportunity. Simulations based on actual job traces from the ASU HPC data center show that the JPCM can achieve up to 18% energy-savings over separated power or job management policies. However, high delay to reach a stable ambient temperature (in case of cooling management through dynamic thermostat setting) can violate the server redline temperatures. A management decision chart is developed as part of MMA to autonomically employ the management policy with maximum energy-savings without violating the window-of-opportunity, and hence the redline temperatures. Further, a prototype of the JPCM is developed by configuring the widely used Moab cluster manager to dynamically change the server priorities for job assignment.