iDISQUE: tuning high-dimensional similarity queries in DHT networks

Authors:
Xiaolong Zhang;Lidan Shou;Kian-Lee Tan;Gang Chen
Affiliations:
College of Computer Science, Zhejiang University, China;College of Computer Science, Zhejiang University, China;School of Computing, National University of Singapore, Singapore;College of Computer Science, Zhejiang University, China
Venue:
DASFAA'10 Proceedings of the 15th international conference on Database Systems for Advanced Applications - Volume Part I
Year:
2010

Citing 18
Cited 1

CURE: an efficient clustering algorithm for large databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Approximate nearest neighbors: towards removing the curse of dimensionality

STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Chord: A scalable peer-to-peer lookup service for internet applications

Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications
A scalable content-addressable network

Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications
BIRCH: A New Data Clustering Algorithm and Its Applications

Data Mining and Knowledge Discovery
Similarity Search in High Dimensions via Hashing

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems

Middleware '01 Proceedings of the IFIP/ACM International Conference on Distributed Systems Platforms Heidelberg
MAAN: A Multi-Attribute Addressable Network for Grid Information Services

GRID '03 Proceedings of the 4th International Workshop on Grid Computing
Adaptive Replication in Peer-to-Peer Systems

ICDCS '04 Proceedings of the 24th International Conference on Distributed Computing Systems (ICDCS'04)
Locality-sensitive hashing scheme based on p-stable distributions

SCG '04 Proceedings of the twentieth annual symposium on Computational geometry
Mercury: supporting scalable multi-attribute range queries

Proceedings of the 2004 conference on Applications, technologies, architectures, and protocols for computer communications
One torus to rule them all: multi-dimensional queries in P2P systems

Proceedings of the 7th International Workshop on the Web and Databases: colocated with ACM SIGMOD/PODS 2004
iDistance: An adaptive B+-tree based indexing method for nearest neighbor search

ACM Transactions on Database Systems (TODS)
VBI-Tree: A Peer-to-Peer Framework for Supporting Multi-Dimensional Indexing Schemes

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Multi-probe LSH: efficient indexing for high-dimensional similarity search

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Peer-to-peer similarity search in metric spaces

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Distributed similarity search in high dimensions using locality sensitive hashing

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Content-based similarity search over peer-to-peer systems

DBISP2P'04 Proceedings of the Second international conference on Databases, Information Systems, and Peer-to-Peer Computing

A scalable approach for content based image retrieval in cloud datacenter

Information Systems Frontiers

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we propose a fully decentralized framework called iDISQUE to support tunable approximate similarity query of high dimensional data in DHT networks. The iDISQUE framework utilizes a distributed indexing scheme to organize data summary structures called iDisques, which describe the cluster information of the data on each peer. The publishing process of iDisques employs a locality-preserving mapping scheme. Approximate similarity queries can be resolved using the distributed index. The accuracy of query results can be tuned both with the publishing and query costs. We employ a multi-probe technique to reduce the index size without compromising the effectiveness of queries. We also propose an effective load-balancing technique based on multi-probing. Experiments on real and synthetic datasets confirm the effectiveness and efficiency of iDISQUE.