A data-centric heuristic for Hadoop provisioning in the cloud

  • Authors:
  • Allahbaksh M. Asadullah;Nikita Jain;Kanika Kapoor;Hajar Falih

  • Affiliations:
  • Infosys Labs, Infosys Ltd., Bangalore, India;Infosys Labs, Infosys Ltd., Bangalore, India;Infosys Labs, Infosys Ltd., Bangalore, India;Akhawayn University, Ifrane, Morocco

  • Venue:
  • Proceedings of the 6th ACM India Computing Convention
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Research agencies and organizations work on algorithms and techniques to reduce Operational and Capital Expenditure. They move to Cloud to transform the Capital Expenditure (Capex) to Operational Expenditure (Opex). They use cloud to crunch large amount of commercial and social data. This paper proposes a heuristic approach to reduce the operational cost of virtual machines (VMs) running Hadoop. The heuristic is simple and effective, it scales the number of Hadoop nodes based on the type and size of the job submitted. We validate our heuristic with Hadoop word-count example on different data samples. Our implementation is independent of the cloud provider. Hence, the heuristic is applicable to both private and public cloud.