On the design of decentralized control architectures for workload consolidation in large-scale server clusters

Authors:
Rui Wang;Nagarajan Kandasamy
Affiliations:
Drexel University, Philadelphia, PA, USA;Drexel University, Philadelphia, PA, USA
Venue:
Proceedings of the 9th international conference on Autonomic computing
Year:
2012

Citing 10
Cited 1

Feedback Control of Computing Systems

Feedback Control of Computing Systems
Reinforcement Learning in Autonomic Computing: A Manifesto and Case Studies

IEEE Internet Computing
Autonomic multi-agent management of power and performance in data centers

Proceedings of the 7th international joint conference on Autonomous agents and multiagent systems: industrial track
Power-Efficient Response Time Guarantees for Virtualized Enterprise Servers

RTSS '08 Proceedings of the 2008 Real-Time Systems Symposium
Power and performance management of virtualized computing environments via lookahead control

Cluster Computing
SHIP: Scalable Hierarchical Power Control for Large-Scale Data Centers

PACT '09 Proceedings of the 2009 18th International Conference on Parallel Architectures and Compilation Techniques
Efficient resource provisioning in compute clouds via VM multiplexing

Proceedings of the 7th international conference on Autonomic computing
A distributed control framework for performance management of virtualized computing environments

Proceedings of the 7th international conference on Autonomic computing
Research challenges in control engineering of computing systems

IEEE Transactions on Network and Service Management
Distributed receding horizon control for multi-vehicle formation stabilization

Automatica (Journal of IFAC)

On the challenges of self-adaptation in systems of systems

Proceedings of the First International Workshop on Software Engineering for Systems-of-Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper develops a fully decentralized control architecture to address the workload consolidation problem in large-scale server clusters wherein the cluster's processing capacity is dynamically tuned to satisfy the service level agreements (SLAs) associated with the incoming workload while consolidating the workload onto the fewest number of servers. In a decentralized setting, this problem is decomposed into simpler subproblems, each of which is mapped to a server and solved by a controller assigned to that server. Though control loops on different servers run independently of each other, they are implicitly coupled via the shared high-level performance goal and interactions between controllers may result in undesired system behavior such as SLA violations and frequent switching of cores on and off. Using the proposed architecture as the reference, we analyze how the organization of individual controllers within the control structure affects its overall performance for large clusters of up to thousand servers. Our studies indicate that the control structure, when organized as a causal system in which a precedence relation exists among the individual controllers, achieves a high degree of SLA satisfaction ( 98%) while significantly reducing the corresponding switching cost.