MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Pig latin: a not-so-foreign language for data processing
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Hive: a warehousing solution over a map-reduce framework
Proceedings of the VLDB Endowment
On availability of intermediate data in cloud computations
HotOS'09 Proceedings of the 12th conference on Hot topics in operating systems
Improving MapReduce performance in heterogeneous environments
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Conductor: orchestrating the clouds
Proceedings of the 4th International Workshop on Large Scale Distributed Systems and Middleware
Dynamic resource allocation for spot markets in clouds
Hot-ICE'11 Proceedings of the 11th USENIX conference on Hot topics in management of internet, cloud, and enterprise networks and services
No one (cluster) size fits all: automatic cluster sizing for data-intensive analytics
Proceedings of the 2nd ACM Symposium on Cloud Computing
Elastic phoenix: malleable mapreduce for shared-memory systems
NPC'11 Proceedings of the 8th IFIP international conference on Network and parallel computing
Auto-scaling to minimize cost and meet application deadlines in cloud workflows
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
SpotMPI: a framework for auction-based HPC computing using amazon spot instances
ICA3PP'11 Proceedings of the 11th international conference on Algorithms and architectures for parallel processing - Volume Part II
The Aneka platform and QoS-driven resource provisioning for elastic applications on hybrid Clouds
Future Generation Computer Systems
Heterogeneity-aware resource allocation and scheduling in the cloud
HotCloud'11 Proceedings of the 3rd USENIX conference on Hot topics in cloud computing
Cutting MapReduce cost with spot market
HotCloud'11 Proceedings of the 3rd USENIX conference on Hot topics in cloud computing
Orchestrating the deployment of computations in the cloud with conductor
NSDI'12 Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation
Maestro: Replica-Aware Map Scheduling for MapReduce
CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Characterizing spot price dynamics in public cloud environments
Future Generation Computer Systems
Combinatorial auction-based allocation of virtual machine instances in clouds
Journal of Parallel and Distributed Computing
Banking on decoupling: budget-driven sustainability for HPC applications on auction-based clouds
ACM SIGOPS Operating Systems Review
Deconstructing Amazon EC2 Spot Instance Pricing
ACM Transactions on Economics and Computation
Speeding-up codon analysis on the cloud with local MapReduce aggregation
Information Sciences: an International Journal
Hi-index | 0.00 |
MapReduce is a scalable and fault tolerant framework, patented by Google, for computing embarrassingly parallel reductions. Hadoop is an open-source implementation of Google MapReduce that is made available as a web service to cloud users by the AmazonWeb Services (AWS) cloud computing infrastructure. Amazon Spot Instances (SIs) provide an inexpensive yet transient and market-based option to purchasing virtualized instances for execution in AWS. As opposed to manually controlling when an instance is terminated, SI termination can also occur automatically as a function of the market price and maximum user bid price. We find that we can significantly improve the runtime of MapReduce jobs in our benchmarks by using SIs as accelerators. However, we also find that SI termination due to budget constraints during the job can have adverse affects on the runtime and may cause the user to overpay for their job. We describe new techniques that help reduce such effects.