Failure-aware resource provisioning for hybrid Cloud infrastructure

Authors:
Bahman Javadi;Jemal Abawajy;Rajkumar Buyya
Affiliations:
School of Computing, Engineering and Mathematics, University of Western Sydney, Australia;School of Information Technology, Deakin University, Geelong, Australia;Cloud Computing and Distributed Systems (CLOUDS) Laboratory, Department of Computing and Information Systems, University of Melbourne, Australia
Venue:
Journal of Parallel and Distributed Computing
Year:
2012

Citing 37
Cited 3

Utilization, Predictability, Workloads, and User Runtime Estimates in Scheduling the IBM SP2 with Backfilling

IEEE Transactions on Parallel and Distributed Systems
The ANL/IBM SP Scheduling System

IPPS '95 Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing
Theory and Practice in Parallel Job Scheduling

IPPS '97 Proceedings of the Job Scheduling Strategies for Parallel Processing
Selective Reservation Strategies for Backfill Job Scheduling

JSSPP '02 Revised Papers from the 8th International Workshop on Job Scheduling Strategies for Parallel Processing
A Model For Speedup of Parallel Programs

A Model For Speedup of Parallel Programs
The workload on parallel supercomputers: modeling the characteristics of rigid jobs

Journal of Parallel and Distributed Computing
Dynamic Scheduling of Parallel Jobs with QoS Demands in Multiclusters and Grids

GRID '04 Proceedings of the 5th IEEE/ACM International Workshop on Grid Computing
Why do internet services fail, and what can be done about it?

USITS'03 Proceedings of the 4th conference on USENIX Symposium on Internet Technologies and Systems - Volume 4
Backfilling Using System-Generated Predictions Rather than User Runtime Estimates

IEEE Transactions on Parallel and Distributed Systems
Fair Load-Balancing on Parallel Systems for QoS

ICPP '07 Proceedings of the 2007 International Conference on Parallel Processing
Inter-operating grids through delegated matchmaking

Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Amazon S3 for science grids: a viable solution?

DADC '08 Proceedings of the 2008 international workshop on Data-aware distributed computing
InterGrid: a case for internetworking islands of Grids

Concurrency and Computation: Practice & Experience
The cost of doing science on the cloud: the Montage example

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
A dynamic admission control scheme to manage contention on shared computing resources

Concurrency and Computation: Practice & Experience
Evaluating the cost-benefit of using cloud computing to extend the capacity of clusters

Proceedings of the 18th ACM international symposium on High performance distributed computing
The Eucalyptus Open-Source Cloud-Computing System

CCGRID '09 Proceedings of the 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid
Cost-benefit analysis of Cloud Computing versus desktop grids

IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
Cloud Security and Privacy: An Enterprise Perspective on Risks and Compliance

Cloud Security and Privacy: An Enterprise Perspective on Risks and Compliance
Virtual Infrastructure Management in Private and Hybrid Clouds

IEEE Internet Computing
Harnessing Cloud Technologies for a Virtualized Distributed Computing Infrastructure

IEEE Internet Computing
The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines

The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines
A grid workflow environment for brain imaging analysis on distributed systems

Concurrency and Computation: Practice & Experience - Special Issue: 3rd International Workshop on Workflow Management and Applications in Grid Environments (WaGe2008)
Determining Service Trustworthiness in Intercloud Computing Environments

ISPAN '09 Proceedings of the 2009 10th International Symposium on Pervasive Systems, Algorithms, and Networks
The Failure Trace Archive: Enabling Comparative Analysis of Failures in Diverse Distributed Systems

CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
Elastic Site: Using Clouds to Elastically Extend Site Resources

CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
Grid Architecture from a Metascheduling Perspective

Computer
Quantifying event correlations for proactive failure management in networked computing systems

Journal of Parallel and Distributed Computing
A flexible checkpoint/restart model in distributed systems

PPAM'09 Proceedings of the 8th international conference on Parallel processing and applied mathematics: Part I
A model for space-correlated failures in large-scale distributed systems

EuroPar'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part I
Managing Peak Loads by Leasing Cloud Infrastructure Services from a Spot Market

HPCC '10 Proceedings of the 2010 IEEE 12th International Conference on High Performance Computing and Communications
Availability in globally distributed storage systems

OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
CloudSim: a toolkit for modeling and simulation of cloud computing environments and evaluation of resource provisioning algorithms

Software—Practice & Experience
Discovering Statistical Models of Availability in Large Distributed Systems: An Empirical Study of SETI@home

IEEE Transactions on Parallel and Distributed Systems
Making wide-area, multi-site MPI feasible using xen VM

ISPA'06 Proceedings of the 2006 international conference on Frontiers of High Performance Computing and Networking
Evaluation of gang scheduling performance and cost in a cloud computing system

The Journal of Supercomputing
Workload characteristics of a multi-cluster supercomputer

JSSPP'04 Proceedings of the 10th international conference on Job Scheduling Strategies for Parallel Processing

Online cost-efficient scheduling of deadline-constrained workloads on hybrid clouds

Future Generation Computer Systems
The Failure Trace Archive: Enabling the comparison of failure measurements and models of distributed systems

Journal of Parallel and Distributed Computing
A job submission manager for large-scale distributed systems based on job futurity predictor

International Journal of Grid and Utility Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Hybrid Cloud computing is receiving increasing attention in recent days. In order to realize the full potential of the hybrid Cloud platform, an architectural framework for efficiently coupling public and private Clouds is necessary. As resource failures due to the increasing functionality and complexity of hybrid Cloud computing are inevitable, a failure-aware resource provisioning algorithm that is capable of attending to the end-users quality of service (QoS) requirements is paramount. In this paper, we propose a scalable hybrid Cloud infrastructure as well as resource provisioning policies to assure QoS targets of the users. The proposed policies take into account the workload model and the failure correlations to redirect users' requests to the appropriate Cloud providers. Using real failure traces and a workload model, we evaluate the proposed resource provisioning policies to demonstrate their performance, cost as well as performance-cost efficiency. Simulation results reveal that in a realistic working condition while adopting user estimates for the requests in the provisioning policies, we are able to improve the users' QoS about 32% in terms of deadline violation rate and 57% in terms of slowdown with a limited cost on a public Cloud.