Big data platforms as a service: challenges and approach

  • Authors:
  • James Horey;Edmon Begoli;Raghul Gunasekaran;Seung-Hwan Lim;James Nutaro

  • Affiliations:
  • Computational Sciences & Engineering, Oak Ridge National Laboratory;Computational Sciences & Engineering, Oak Ridge National Laboratory;Computational Sciences & Engineering, Oak Ridge National Laboratory;Computational Sciences & Engineering, Oak Ridge National Laboratory;Computational Sciences & Engineering, Oak Ridge National Laboratory

  • Venue:
  • HotCloud'12 Proceedings of the 4th USENIX conference on Hot Topics in Cloud Ccomputing
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Infrastructure-as-a-Service has revolutionized the manner in which users commission computing infrastructure. Coupled with Big Data platforms (Hadoop, Cassandra), IaaS has democratized the ability to store and process massive datasets. For users that need to customize or create new Big Data stacks, however, readily available solutions do not yet exist. Users must first acquire the necessary cloud computing infrastructure, and manually install the prerequisite software. For complex distributed services this can be a daunting challenge. To address this issue, we argue that distributed services should be viewed as a single application consisting of virtual machines. Users should no longer be concerned about individual machines or their internal organization. To illustrate this concept, we introduce Cloud-Get, a distributed package manager that enables the simple installation of distributed services in a cloud computing environment. Cloud-Get enables users to instantiate and modify distributed services, including Big Data services, using simple commands. Cloud-Get also simplifies creating new distributed services via standardized package definitions.