See spot run: using spot instances for mapreduce workflows

Authors:
Navraj Chohan;Claris Castillo;Mike Spreitzer;Malgorzata Steinder;Asser Tantawi;Chandra Krintz
Affiliations:
Computer Science Department, University of California, Santa Barbara, CA;IBM Watson Research, Hawthorne, New York;IBM Watson Research, Hawthorne, New York;IBM Watson Research, Hawthorne, New York;IBM Watson Research, Hawthorne, New York;Computer Science Department, University of California, Santa Barbara, CA
Venue:
HotCloud'10 Proceedings of the 2nd USENIX conference on Hot topics in cloud computing
Year:
2010

Citing 5
Cited 16

MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Pig latin: a not-so-foreign language for data processing

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Hive: a warehousing solution over a map-reduce framework

Proceedings of the VLDB Endowment
On availability of intermediate data in cloud computations

HotOS'09 Proceedings of the 12th conference on Hot topics in operating systems
Improving MapReduce performance in heterogeneous environments

OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation

Conductor: orchestrating the clouds

Proceedings of the 4th International Workshop on Large Scale Distributed Systems and Middleware
Dynamic resource allocation for spot markets in clouds

Hot-ICE'11 Proceedings of the 11th USENIX conference on Hot topics in management of internet, cloud, and enterprise networks and services
No one (cluster) size fits all: automatic cluster sizing for data-intensive analytics

Proceedings of the 2nd ACM Symposium on Cloud Computing
Elastic phoenix: malleable mapreduce for shared-memory systems

NPC'11 Proceedings of the 8th IFIP international conference on Network and parallel computing
Auto-scaling to minimize cost and meet application deadlines in cloud workflows

Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
SpotMPI: a framework for auction-based HPC computing using amazon spot instances

ICA3PP'11 Proceedings of the 11th international conference on Algorithms and architectures for parallel processing - Volume Part II
The Aneka platform and QoS-driven resource provisioning for elastic applications on hybrid Clouds

Future Generation Computer Systems
Heterogeneity-aware resource allocation and scheduling in the cloud

HotCloud'11 Proceedings of the 3rd USENIX conference on Hot topics in cloud computing
Cutting MapReduce cost with spot market

HotCloud'11 Proceedings of the 3rd USENIX conference on Hot topics in cloud computing
Orchestrating the deployment of computations in the cloud with conductor

NSDI'12 Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation
Maestro: Replica-Aware Map Scheduling for MapReduce

CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Characterizing spot price dynamics in public cloud environments

Future Generation Computer Systems
Combinatorial auction-based allocation of virtual machine instances in clouds

Journal of Parallel and Distributed Computing
Banking on decoupling: budget-driven sustainability for HPC applications on auction-based clouds

ACM SIGOPS Operating Systems Review
Deconstructing Amazon EC2 Spot Instance Pricing

ACM Transactions on Economics and Computation
Speeding-up codon analysis on the cloud with local MapReduce aggregation

Information Sciences: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

MapReduce is a scalable and fault tolerant framework, patented by Google, for computing embarrassingly parallel reductions. Hadoop is an open-source implementation of Google MapReduce that is made available as a web service to cloud users by the AmazonWeb Services (AWS) cloud computing infrastructure. Amazon Spot Instances (SIs) provide an inexpensive yet transient and market-based option to purchasing virtualized instances for execution in AWS. As opposed to manually controlling when an instance is terminated, SI termination can also occur automatically as a function of the market price and maximum user bid price. We find that we can significantly improve the runtime of MapReduce jobs in our benchmarks by using SIs as accelerators. However, we also find that SI termination due to budget constraints during the job can have adverse affects on the runtime and may cause the user to overpay for their job. We describe new techniques that help reduce such effects.