Parallel database systems: the future of high performance database systems
Communications of the ACM
A scalable content-addressable network
Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications
The Gamma Database Machine Project
IEEE Transactions on Knowledge and Data Engineering
Chord: a scalable peer-to-peer lookup protocol for internet applications
IEEE/ACM Transactions on Networking (TON)
An Overview of The System Software of A Parallel Relational Database Machine GRACE
VLDB '86 Proceedings of the 12th International Conference on Very Large Data Bases
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
BATON: a balanced tree structure for peer-to-peer networks
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Bigtable: a distributed storage system for structured data
OSDI '06 Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation - Volume 7
Dynamo: amazon's highly available key-value store
Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
MapReduce: simplified data processing on large clusters
Communications of the ACM - 50th anniversary issue: 1958 - 2008
Paths to stardom: calibrating the potential of a peer-based data management system
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Pig latin: a not-so-foreign language for data processing
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
SCOPE: easy and efficient parallel processing of massive data sets
Proceedings of the VLDB Endowment
PNUTS: Yahoo!'s hosted data serving platform
Proceedings of the VLDB Endowment
A comparison of approaches to large-scale data analysis
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Peer-to-Peer Computing: Principles and Applications
Peer-to-Peer Computing: Principles and Applications
Consistency rationing in the cloud: pay only when it matters
Proceedings of the VLDB Endowment
Locking key ranges with unbundled transaction services
Proceedings of the VLDB Endowment
HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads
Proceedings of the VLDB Endowment
Cassandra: a decentralized structured storage system
ACM SIGOPS Operating Systems Review
G-Store: a scalable data store for transactional multi key access in the cloud
Proceedings of the 1st ACM symposium on Cloud computing
Indexing multi-dimensional data in a cloud system
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Volley: automated data placement for geo-distributed cloud services
NSDI'10 Proceedings of the 7th USENIX conference on Networked systems design and implementation
High-end biological imaging generates very large 3D+ and dynamic datasets
Proceedings of the VLDB Endowment
The performance of MapReduce: an in-depth study
Proceedings of the VLDB Endowment
MRShare: sharing across multiple queries in MapReduce
Proceedings of the VLDB Endowment
Towards elastic transactional cloud storage with range query support
Proceedings of the VLDB Endowment
Efficient B-tree based indexing for cloud data processing
Proceedings of the VLDB Endowment
Llama: leveraging columnar storage for scalable join processing in the MapReduce framework
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Query optimization for massively parallel data processing
Proceedings of the 2nd ACM Symposium on Cloud Computing
Distributed data management using MapReduce
ACM Computing Surveys (CSUR)
Hi-index | 0.00 |
The Cloud is fast gaining popularity as a platform for deploying Software as a Service (SaaS) applications. In principle, the Cloud provides unlimited compute resources, enabling deployed services to scale seamlessly. Moreover, the pay-as-you-go model in the Cloud reduces the maintenance overhead of the applications. Given the advantages of the Cloud, it is attractive to migrate existing software to this new platform. However, challenges remain as most software applications need to be redesigned to embrace the Cloud. In this paper, we present an overview of our current on-going work in developing epiC - an elastic and efficient power-aware data-intensive Cloud system. We discuss the design issues and the implementation of epiC's storage system and processing engine. The storage system and the processing engine are loosely coupled, and have been designed to handle two types of workload simultaneously, namely data-intensive analytical jobs and online transactions (commonly referred as OLAP and OLTP respectively). The processing of large-scale analytical jobs in epiC adopts a phase-based processing strategy, which provides a fine-grained fault tolerance, while the processing of queries adopts indexing and filter-and-refine strategies.