Efficient B-tree based indexing for cloud data processing

Authors:
Sai Wu;Dawei Jiang;Beng Chin Ooi;Kun-Lung Wu
Affiliations:
National University of Singapore, Singapore;National University of Singapore, Singapore;National University of Singapore, Singapore;IBM T. J. Watson Research Center
Venue:
Proceedings of the VLDB Endowment
Year:
2010

Citing 16
Cited 11

Parallel database systems: the future of high performance database systems

Communications of the ACM
Stochastic processes

Stochastic processes
OceanStore: an architecture for global-scale persistent storage

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
A scalable content-addressable network

Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications
The Google file system

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
P-Grid: a self-organizing structured P2P system

ACM SIGMOD Record
Farsite: federated, available, and reliable storage for an incompletely trusted environment

OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
BATON: a balanced tree structure for peer-to-peer networks

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Speeding up search in peer-to-peer networks with a multi-way tree structure

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
P-ring: an efficient and robust P2P range index structure

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Bigtable: a distributed storage system for structured data

OSDI '06 Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation - Volume 7
Sinfonia: a new paradigm for building scalable distributed systems

Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
Dynamo: amazon's highly available key-value store

Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
A practical scalable distributed B-tree

Proceedings of the VLDB Endowment
Indexing multi-dimensional data in a cloud system

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
The impact of virtualization on network performance of amazon EC2 data center

INFOCOM'10 Proceedings of the 29th conference on Information communications

SNFS: the design and implementation of a social network file system

Proceedings of the 4th Workshop on Social Network Systems
Providing scalable database services on the cloud

WISE'10 Proceedings of the 11th international conference on Web information systems engineering
Efficient parallel kNN joins for large data in MapReduce

Proceedings of the 15th International Conference on Extending Database Technology
Minuet: a scalable distributed multiversion B-tree

Proceedings of the VLDB Endowment
On saying "enough already!" in MapReduce

Proceedings of the 1st International Workshop on Cloud Intelligence
Robust distributed indexing for locality-skewed workloads

Proceedings of the 21st ACM international conference on Information and knowledge management
Distributed data management using MapReduce

ACM Computing Surveys (CSUR)
Database research at the National University of Singapore

ACM SIGMOD Record
An index model for multitenant data storage in saas

WAIM'13 Proceedings of the 14th international conference on Web-Age Information Management
A multi-dimensional index structure based on improved VA-file and CAN in the cloud

International Journal of Automation and Computing
SeaCloudDM: a database cluster framework for managing and querying massive heterogeneous sensor sampling data

The Journal of Supercomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

A Cloud may be seen as a type of flexible computing infrastructure consisting of many compute nodes, where resizable computing capacities can be provided to different customers. To fully harness the power of the Cloud, efficient data management is needed to handle huge volumes of data and support a large number of concurrent end users. To achieve that, a scalable and high-throughput indexing scheme is generally required. Such an indexing scheme must not only incur a low maintenance cost but also support parallel search to improve scalability. In this paper, we present a novel, scalable B+-tree based indexing scheme for efficient data processing in the Cloud. Our approach can be summarized as follows. First, we build a local B+-tree index for each compute node which only indexes data residing on the node. Second, we organize the compute nodes as a structured overlay and publish a portion of the local B+-tree nodes to the overlay for efficient query processing. Finally, we propose an adaptive algorithm to select the published B+-tree nodes according to query patterns. We conduct extensive experiments on Amazon's EC2, and the results demonstrate that our indexing scheme is dynamic, efficient and scalable.