A Framework for Data-Intensive Computing with Cloud Bursting

Authors:
Tekin Bicer;David Chiu;Gagan Agrawal
Affiliations:
-;-;-
Venue:
CLUSTER '11 Proceedings of the 2011 IEEE International Conference on Cluster Computing
Year:
2011

Citing 0
Cited 2

Time and Cost Sensitive Data-Intensive Computing on Hybrid Clouds

CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
A Framework and Middleware for Application-Level Cloud Bursting on Top of Infrastructure-as-a-Service Clouds

UCC '13 Proceedings of the 2013 IEEE/ACM 6th International Conference on Utility and Cloud Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

For many organizations, one attractive use of cloud resources can be through what is referred to as cloud bursting or the hybrid cloud. These refer to scenarios where an organization acquires and manages in-house resources to meet its base need, but can use additional resources from a cloud provider to maintain an acceptable response time during workload peaks. Cloud bursting has so far been discussed in the context of using additional computing resources from a cloud provider. However, as next generation applications are expected to see orders of magnitude increase in data set sizes, cloud resources can be used to store additional data after local resources are exhausted. In this paper, we consider the challenge of data analysis in a scenario where data is stored across a local cluster and cloud resources. We describe a software framework to enable data-intensive computing with cloud bursting, i.e., using a combination of compute resources from a local cluster and a cloud environment to perform Map-Reduce type processing on a data set that is geographically distributed. Our evaluation with three different applications shows that data-intensive computing with cloud bursting is feasible and scalable. Particularly, as compared to a situation where the data set is stored at one location and processed using resources at that end, the average slowdown of our system (using distributed but the same aggregate number of compute resources), is only 15.55%. Thus, the overheads due to global reduction, remote data retrieval, and potential load imbalance are quite manageable. Our system scales with an average speedup of 81% when the number of compute resources is doubled.