Big data platforms as a service: challenges and approach

Authors:
James Horey;Edmon Begoli;Raghul Gunasekaran;Seung-Hwan Lim;James Nutaro
Affiliations:
Computational Sciences & Engineering, Oak Ridge National Laboratory;Computational Sciences & Engineering, Oak Ridge National Laboratory;Computational Sciences & Engineering, Oak Ridge National Laboratory;Computational Sciences & Engineering, Oak Ridge National Laboratory;Computational Sciences & Engineering, Oak Ridge National Laboratory
Venue:
HotCloud'12 Proceedings of the 4th USENIX conference on Hot Topics in Cloud Ccomputing
Year:
2012

Citing 9
Cited 0

MapReduce: simplified data processing on large clusters

Communications of the ACM - 50th anniversary issue: 1958 - 2008
Data Management Challenges of Data-Intensive Scientific Workflows

CCGRID '08 Proceedings of the 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid
A view of cloud computing

Communications of the ACM
Cassandra: a decentralized structured storage system

ACM SIGOPS Operating Systems Review
ZooKeeper: wait-free coordination for internet-scale systems

USENIXATC'10 Proceedings of the 2010 USENIX conference on USENIX annual technical conference
No free lunch in data privacy

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Migration, assignment, and scheduling of jobs in virtualized environment

HotCloud'11 Proceedings of the 3rd USENIX conference on Hot topics in cloud computing
Exertion-based billing for cloud storage access

HotCloud'11 Proceedings of the 3rd USENIX conference on Hot topics in cloud computing
The datacenter needs an operating system

HotCloud'11 Proceedings of the 3rd USENIX conference on Hot topics in cloud computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Infrastructure-as-a-Service has revolutionized the manner in which users commission computing infrastructure. Coupled with Big Data platforms (Hadoop, Cassandra), IaaS has democratized the ability to store and process massive datasets. For users that need to customize or create new Big Data stacks, however, readily available solutions do not yet exist. Users must first acquire the necessary cloud computing infrastructure, and manually install the prerequisite software. For complex distributed services this can be a daunting challenge. To address this issue, we argue that distributed services should be viewed as a single application consisting of virtual machines. Users should no longer be concerned about individual machines or their internal organization. To illustrate this concept, we introduce Cloud-Get, a distributed package manager that enables the simple installation of distributed services in a cloud computing environment. Cloud-Get enables users to instantiate and modify distributed services, including Big Data services, using simple commands. Cloud-Get also simplifies creating new distributed services via standardized package definitions.