ACM Transactions on Computer Systems (TOCS)
Xen and the art of virtualization
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
The Eucalyptus Open-Source Cloud-Computing System
CCGRID '09 Proceedings of the 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid
Cassandra: structured storage system on a P2P network
Proceedings of the 28th ACM symposium on Principles of distributed computing
Enforcing User-Defined Management Logic in Large Scale Systems
SERVICES '09 Proceedings of the 2009 Congress on Services - I
Virtual Infrastructure Management in Private and Hybrid Clouds
IEEE Internet Computing
ZooKeeper: wait-free coordination for internet-scale systems
USENIXATC'10 Proceedings of the 2010 USENIX conference on USENIX annual technical conference
BlobSeer: Next-generation data management for large scale infrastructures
Journal of Parallel and Distributed Computing
A Comparison and Critique of Eucalyptus, OpenNebula and Nimbus
CLOUDCOM '10 Proceedings of the 2010 IEEE Second International Conference on Cloud Computing Technology and Science
Snooze: A Scalable, Fault-Tolerant and Distributed Consolidation Manager for Large-Scale Clusters
GREENCOM-CPSCOM '10 Proceedings of the 2010 IEEE/ACM Int'l Conference on Green Computing and Communications & Int'l Conference on Cyber, Physical and Social Computing
OpenNebula: A Cloud Management Tool
IEEE Internet Computing
Energy-Aware Ant Colony Based Workload Placement in Clouds
GRID '11 Proceedings of the 2011 IEEE/ACM 12th International Conference on Grid Computing
Cooperative dynamic scheduling of virtual machines in distributed systems
Euro-Par'11 Proceedings of the 2011 international conference on Parallel Processing - Volume 2
Meryn: open, SLA-driven, cloud bursting PaaS
Proceedings of the first ACM workshop on Optimization techniques for resources management in clouds
Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems
Dynamic resource allocation with management objectives: implementation for an OpenStack cloud
Proceedings of the 8th International Conference on Network and Service Management
Introducing service-level awareness in the cloud
Proceedings of the 4th annual Symposium on Cloud Computing
Hi-index | 0.00 |
With the advent of cloud computing and the need to satisfy growing customers resource demands, cloud providers now operate increasing amounts of large data centers. In order to ease the creation of private clouds, several open-source Infrastructure-as-a-Service (IaaS) cloud management frameworks (e.g. Open Nebula, Nimbus, Eucalyptus, Open Stack) have been proposed. However, all these systems are either highly centralized or have limited fault tolerance support. Consequently, they all share common drawbacks: scalability is limited by a single master node and Single Point of Failure (SPOF). In this paper, we present the design, implementation and evaluation of a novel scalable and autonomic (i.e. self-organizing and healing) virtual machine (VM) management framework called Snooze. For scalability the system utilizes a self-organizing hierarchical architecture and performs distributed VM management. Moreover, fault tolerance is provided at all levels of the hierarchy, thus allowing the system to self-heal in case of failures. Our evaluation conducted on 144 physical machines of the Grid'5000 experimental test bed shows that the fault tolerance features of the framework do not impact application performance. Moreover, negligible cost is involved in performing distributed VM management and the system remains highly scalable with increasing amounts of resources.