A data-centric heuristic for Hadoop provisioning in the cloud

Authors:
Allahbaksh M. Asadullah;Nikita Jain;Kanika Kapoor;Hajar Falih
Affiliations:
Infosys Labs, Infosys Ltd., Bangalore, India;Infosys Labs, Infosys Ltd., Bangalore, India;Infosys Labs, Infosys Ltd., Bangalore, India;Akhawayn University, Ifrane, Morocco
Venue:
Proceedings of the 6th ACM India Computing Convention
Year:
2013

Citing 7
Cited 0

The Google file system

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
MapReduce: simplified data processing on large clusters

Communications of the ACM - 50th anniversary issue: 1958 - 2008
Towards optimizing hadoop provisioning in the cloud

HotCloud'09 Proceedings of the 2009 conference on Hot topics in cloud computing
Hadoop++: making a yellow elephant run like a cheetah (without it even noticing)

Proceedings of the VLDB Endowment
CoHadoop: flexible data placement and its exploitation in Hadoop

Proceedings of the VLDB Endowment
Resource provisioning framework for mapreduce jobs with performance goals

Middleware'11 Proceedings of the 12th ACM/IFIP/USENIX international conference on Middleware
Why do migrations fail and what can we do about it?

LISA'11 Proceedings of the 25th international conference on Large Installation System Administration

Quantified Score

Hi-index	0.00

Visualization

Abstract

Research agencies and organizations work on algorithms and techniques to reduce Operational and Capital Expenditure. They move to Cloud to transform the Capital Expenditure (Capex) to Operational Expenditure (Opex). They use cloud to crunch large amount of commercial and social data. This paper proposes a heuristic approach to reduce the operational cost of virtual machines (VMs) running Hadoop. The heuristic is simple and effective, it scales the number of Hadoop nodes based on the type and size of the job submitted. We validate our heuristic with Hadoop word-count example on different data samples. Our implementation is independent of the cloud provider. Hence, the heuristic is applicable to both private and public cloud.