Providing scalable database services on the cloud

Authors:
Chun Chen;Gang Chen;Dawei Jiang;Beng Chin Ooi;Hoang Tam Vo;Sai Wu;Quanqing Xu
Affiliations:
Zhejiang University, China;Zhejiang University, China;National University of Singapore;National University of Singapore;National University of Singapore;National University of Singapore;National University of Singapore
Venue:
WISE'10 Proceedings of the 11th international conference on Web information systems engineering
Year:
2010

Citing 28
Cited 3

Parallel database systems: the future of high performance database systems

Communications of the ACM
A scalable content-addressable network

Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications
The Gamma Database Machine Project

IEEE Transactions on Knowledge and Data Engineering
Chord: a scalable peer-to-peer lookup protocol for internet applications

IEEE/ACM Transactions on Networking (TON)
An Overview of The System Software of A Parallel Relational Database Machine GRACE

VLDB '86 Proceedings of the 12th International Conference on Very Large Data Bases
The Google file system

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
BATON: a balanced tree structure for peer-to-peer networks

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Bigtable: a distributed storage system for structured data

OSDI '06 Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation - Volume 7
Dynamo: amazon's highly available key-value store

Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
MapReduce: simplified data processing on large clusters

Communications of the ACM - 50th anniversary issue: 1958 - 2008
Paths to stardom: calibrating the potential of a peer-based data management system

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Pig latin: a not-so-foreign language for data processing

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
SCOPE: easy and efficient parallel processing of massive data sets

Proceedings of the VLDB Endowment
PNUTS: Yahoo!'s hosted data serving platform

Proceedings of the VLDB Endowment
A comparison of approaches to large-scale data analysis

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Peer-to-Peer Computing: Principles and Applications

Peer-to-Peer Computing: Principles and Applications
Consistency rationing in the cloud: pay only when it matters

Proceedings of the VLDB Endowment
Locking key ranges with unbundled transaction services

Proceedings of the VLDB Endowment
HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads

Proceedings of the VLDB Endowment
Cassandra: a decentralized structured storage system

ACM SIGOPS Operating Systems Review
G-Store: a scalable data store for transactional multi key access in the cloud

Proceedings of the 1st ACM symposium on Cloud computing
Indexing multi-dimensional data in a cloud system

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Volley: automated data placement for geo-distributed cloud services

NSDI'10 Proceedings of the 7th USENIX conference on Networked systems design and implementation
High-end biological imaging generates very large 3D+ and dynamic datasets

Proceedings of the VLDB Endowment
The performance of MapReduce: an in-depth study

Proceedings of the VLDB Endowment
MRShare: sharing across multiple queries in MapReduce

Proceedings of the VLDB Endowment
Towards elastic transactional cloud storage with range query support

Proceedings of the VLDB Endowment
Efficient B-tree based indexing for cloud data processing

Proceedings of the VLDB Endowment

Llama: leveraging columnar storage for scalable join processing in the MapReduce framework

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Query optimization for massively parallel data processing

Proceedings of the 2nd ACM Symposium on Cloud Computing
Distributed data management using MapReduce

ACM Computing Surveys (CSUR)

Quantified Score

Hi-index	0.00

Visualization

Abstract

The Cloud is fast gaining popularity as a platform for deploying Software as a Service (SaaS) applications. In principle, the Cloud provides unlimited compute resources, enabling deployed services to scale seamlessly. Moreover, the pay-as-you-go model in the Cloud reduces the maintenance overhead of the applications. Given the advantages of the Cloud, it is attractive to migrate existing software to this new platform. However, challenges remain as most software applications need to be redesigned to embrace the Cloud. In this paper, we present an overview of our current on-going work in developing epiC - an elastic and efficient power-aware data-intensive Cloud system. We discuss the design issues and the implementation of epiC's storage system and processing engine. The storage system and the processing engine are loosely coupled, and have been designed to handle two types of workload simultaneously, namely data-intensive analytical jobs and online transactions (commonly referred as OLAP and OLTP respectively). The processing of large-scale analytical jobs in epiC adopts a phase-based processing strategy, which provides a fine-grained fault tolerance, while the processing of queries adopts indexing and filter-and-refine strategies.