See spot run: using spot instances for mapreduce workflows

  • Authors:
  • Navraj Chohan;Claris Castillo;Mike Spreitzer;Malgorzata Steinder;Asser Tantawi;Chandra Krintz

  • Affiliations:
  • Computer Science Department, University of California, Santa Barbara, CA;IBM Watson Research, Hawthorne, New York;IBM Watson Research, Hawthorne, New York;IBM Watson Research, Hawthorne, New York;IBM Watson Research, Hawthorne, New York;Computer Science Department, University of California, Santa Barbara, CA

  • Venue:
  • HotCloud'10 Proceedings of the 2nd USENIX conference on Hot topics in cloud computing
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

MapReduce is a scalable and fault tolerant framework, patented by Google, for computing embarrassingly parallel reductions. Hadoop is an open-source implementation of Google MapReduce that is made available as a web service to cloud users by the AmazonWeb Services (AWS) cloud computing infrastructure. Amazon Spot Instances (SIs) provide an inexpensive yet transient and market-based option to purchasing virtualized instances for execution in AWS. As opposed to manually controlling when an instance is terminated, SI termination can also occur automatically as a function of the market price and maximum user bid price. We find that we can significantly improve the runtime of MapReduce jobs in our benchmarks by using SIs as accelerators. However, we also find that SI termination due to budget constraints during the job can have adverse affects on the runtime and may cause the user to overpay for their job. We describe new techniques that help reduce such effects.