MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Amazon S3 for science grids: a viable solution?
DADC '08 Proceedings of the 2008 international workshop on Data-aware distributed computing
Evaluating the cost-benefit of using cloud computing to extend the capacity of clusters
Proceedings of the 18th ACM international symposium on High performance distributed computing
Automated control for elastic storage
Proceedings of the 7th international conference on Autonomic computing
Elastic Site: Using Clouds to Elastically Extend Site Resources
CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
MOON: MapReduce On Opportunistic eNvironments
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
ElasTraS: an elastic transactional data store in the cloud
HotCloud'09 Proceedings of the 2009 conference on Hot topics in cloud computing
Towards optimizing hadoop provisioning in the cloud
HotCloud'09 Proceedings of the 2009 conference on Hot topics in cloud computing
Improving MapReduce performance in heterogeneous environments
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Elastic Cloud Caches for Accelerating Service-Oriented Computations
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Data Sharing Options for Scientific Workflows on Amazon EC2
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Bag-of-Tasks Scheduling under Budget Constraints
CLOUDCOM '10 Proceedings of the 2010 IEEE Second International Conference on Cloud Computing Technology and Science
A hierarchical framework for cross-domain MapReduce execution
Proceedings of the second international workshop on Emerging computational methods for the life sciences
Exploring MapReduce efficiency with highly-distributed data
Proceedings of the second international workshop on MapReduce and its applications
Auto-scaling to minimize cost and meet application deadlines in cloud workflows
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
A Framework for Data-Intensive Computing with Cloud Bursting
CLUSTER '11 Proceedings of the 2011 IEEE International Conference on Cluster Computing
MATE-EC2: a middleware for processing data with AWS
Proceedings of the 2011 ACM international workshop on Many task computing on grids and supercomputers
Estimating resource costs of data-intensive workloads in public clouds
Proceedings of the 10th International Workshop on Middleware for Grids, Clouds and e-Science
Hi-index | 0.00 |
Purpose-built clusters permeate many of today's organizations, providing both large-scale data storage and computing. Within local clusters, competition for resources complicates applications with deadlines. However, given the emergence of the cloud's pay-as-you-go model, users are increasingly storing portions of their data remotely and allocating compute nodes on-demand to meet deadlines. This scenario gives rise to a hybrid cloud, where data stored across local and cloud resources may be processed over both environments. While a hybrid execution environment may be used to meet time constraints, users must now attend to the costs associated with data storage, data transfer, and node allocation time on the cloud. In this paper, we describe a modeling-driven resource allocation framework to support both time and cost sensitive execution for data-intensive applications executed in a hybrid cloud setting. We evaluate our framework using two data-intensive applications and a number of time and cost constraints. Our experimental results show that our system is capable of meeting execution deadlines within a 3.6% margin of error. Similarly, cost constraints are met within a 1.2% margin of error, while minimizing the application's execution time.