CCIndex: a complemental clustering index on distributed ordered tables for multi-dimensional range queries

Authors:
Yongqiang Zou;Jia Liu;Shicai Wang;Li Zha;Zhiwei Xu
Affiliations:
Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China;Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China;Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China;Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China;Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
Venue:
NPC'10 Proceedings of the 2010 IFIP international conference on Network and parallel computing
Year:
2010

Citing 18
Cited 3

Ubiquitous B-Tree

ACM Computing Surveys (CSUR)
Chord: A scalable peer-to-peer lookup service for internet applications

Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications
A scalable content-addressable network

Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications
R-trees: a dynamic index structure for spatial searching

SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
An Overview of The System Software of A Parallel Relational Database Machine GRACE

VLDB '86 Proceedings of the 12th International Conference on Very Large Data Bases
GAMMA - A High Performance Dataflow Database Machine

VLDB '86 Proceedings of the 12th International Conference on Very Large Data Bases
The R+-Tree: A Dynamic Index for Multi-Dimensional Objects

VLDB '87 Proceedings of the 13th International Conference on Very Large Data Bases
Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems

Middleware '01 Proceedings of the IFIP/ACM International Conference on Distributed Systems Platforms Heidelberg
Multi-dimensional clustering: a new data layout scheme in DB2

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Tapestry: An Infrastructure for Fault-tolerant Wide-area Location and

Tapestry: An Infrastructure for Fault-tolerant Wide-area Location and
Recovery principles of MySQL Cluster 5.1

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Boxwood: abstractions as the foundation for storage infrastructure

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Bigtable: a distributed storage system for structured data

OSDI '06 Proceedings of the 7th symposium on Operating systems design and implementation
Data management projects at Google

ACM SIGMOD Record
Design and implementation trade-offs for wide-area resource discovery

ACM Transactions on Internet Technology (TOIT)
PNUTS: Yahoo!'s hosted data serving platform

Proceedings of the VLDB Endowment
Traverse: Simplified Indexing on Large Map-Reduce-Merge Clusters

DASFAA '09 Proceedings of the 14th International Conference on Database Systems for Advanced Applications
Adaptively parallelizing distributed range queries

Proceedings of the VLDB Endowment

An efficient index for massive IOT data in cloud environment

Proceedings of the 21st ACM international conference on Information and knowledge management
Fast multi-fields query processing in bigtable based cloud systems

WAIM'13 Proceedings of the 14th international conference on Web-Age Information Management
A multi-dimensional index structure based on improved VA-file and CAN in the cloud

International Journal of Automation and Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Massive scale distributed database like Google's BigTable and Yahoo!' s PNUTS can be modeled as Distributed Ordered Table, or DOT, which partitions data regions and supports range queries on key. Multi-dimensional range queries on DOTs are fundamental requirements; however, none of existing schemes work well while considering three critical issues: high performance, low space overhead, and high reliability. This paper introduces CCIndex scheme, short for Complemental Clustering Index, to solve all three issues. CCIndex creates several Complemental Clustering Index Tables for performance, leverages region-to-server information to estimate result size, and supports incremental data recovery. This paper builds a prototype on Apache HBase. Theoretical analysis and micro-benchmarks show that CCIndex consumes 5.3% ∼ 29.3% more space, has the same reliability, and gains 11.4 times range queries throughput of secondary index scheme. Synthetic application benchmark shows that CCIndex query throughput is 1.9 ∼ 2.1 times of MySQL Cluster.