K-d trees for semidynamic point sets
SCG '90 Proceedings of the sixth annual symposium on Computational geometry
The pyramid-technique: towards breaking the curse of dimensionality
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Computational Geometry: Theory and Applications
Chord: A scalable peer-to-peer lookup service for internet applications
Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications
A scalable content-addressable network
Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications
R-trees: a dynamic index structure for spatial searching
SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
Supporting Ranked Boolean Similarity Queries in MARS
IEEE Transactions on Knowledge and Data Engineering
Similarity Search in High Dimensions via Hashing
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Indexing the Distance: An Efficient Method to KNN Processing
Proceedings of the 27th International Conference on Very Large Data Bases
Peer-to-peer information retrieval using self-organizing semantic overlay networks
Proceedings of the 2003 conference on Applications, technologies, architectures, and protocols for computer communications
Locality-sensitive hashing scheme based on p-stable distributions
SCG '04 Proceedings of the twentieth annual symposium on Computational geometry
Mercury: supporting scalable multi-attribute range queries
Proceedings of the 2004 conference on Applications, technologies, architectures, and protocols for computer communications
SWAM: a family of access methods for similarity-search in peer-to-peer data networks
Proceedings of the thirteenth ACM international conference on Information and knowledge management
LSH forest: self-tuning indexes for similarity search
WWW '05 Proceedings of the 14th international conference on World Wide Web
iDistance: An adaptive B+-tree based indexing method for nearest neighbor search
ACM Transactions on Database Systems (TODS)
Gossip-based aggregation in large dynamic networks
ACM Transactions on Computer Systems (TOCS)
Entropy based nearest neighbor search in high dimensions
SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
VBI-Tree: A Peer-to-Peer Framework for Supporting Multi-Dimensional Indexing Schemes
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Advances in metric embedding theory
Proceedings of the thirty-eighth annual ACM symposium on Theory of computing
Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions
FOCS '06 Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science
P-ring: an efficient and robust P2P range index structure
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Approximate range searching in higher dimension
Computational Geometry: Theory and Applications
Multi-probe LSH: efficient indexing for high-dimensional similarity search
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Peer-to-peer similarity search in metric spaces
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Nearest Neighbor Retrieval Using Distance-Based Hashing
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
A content-addressable network for similarity search in metric spaces
DBISP2P'05/06 Proceedings of the 2005/2006 international conference on Databases, information systems, and peer-to-peer computing
Replication, load balancing and efficient range query processing in DHTs
EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology
Content-based similarity search over peer-to-peer systems
DBISP2P'04 Proceedings of the Second international conference on Databases, Information Systems, and Peer-to-Peer Computing
Nearest neighbor search: algorithmic perspective
SIGSPATIAL Special
GRAMS3: an efficient framework for XML structural similarity search
DASFAA'10 Proceedings of the 15th international conference on Database systems for advanced applications
The state of the art in content-based image retrieval in P2P networks
ICIMCS '10 Proceedings of the Second International Conference on Internet Multimedia Computing and Service
Distributed similarity estimation using derived dimensions
The VLDB Journal — The International Journal on Very Large Data Bases
iDISQUE: tuning high-dimensional similarity queries in DHT networks
DASFAA'10 Proceedings of the 15th international conference on Database Systems for Advanced Applications - Volume Part I
Distributed knowledge discovery with non linear dimensionality reduction
PAKDD'10 Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part II
SISAP'12 Proceedings of the 5th international conference on Similarity Search and Applications
Efficient distributed locality sensitive hashing
Proceedings of the 21st ACM international conference on Information and knowledge management
Real-time image recognition with the parallel directed enumeration method
ICVS'13 Proceedings of the 9th international conference on Computer Vision Systems
DIMO: distributed index for matching multimedia objects using MapReduce
Proceedings of the 5th ACM Multimedia Systems Conference
The VLDB Journal — The International Journal on Very Large Data Bases
A scalable approach for content based image retrieval in cloud datacenter
Information Systems Frontiers
Hi-index | 0.00 |
In this paper we consider distributed K-Nearest Neighbor (KNN) search and range query processing in high dimensional data. Our approach is based on Locality Sensitive Hashing (LSH) which has proven very efficient in answering KNN queries in centralized settings. We consider mappings from the multi-dimensional LSH bucket space to the linearly ordered set of peers that jointly maintain the indexed data and derive requirements to achieve high quality search results and limit the number of network accesses. We put forward two such mappings that come with these salient properties: being locality preserving so that buckets likely to hold similar data are stored on the same or neighboring peers and having a predictable output distribution to ensure fair load balancing. We show how to leverage the linearly aligned data for efficient KNN search and how to efficiently process range queries which is, to the best of our knowledge, not possible in existing LSH schemes. We show by comprehensive performance evaluations using real world data that our approach brings major performance and accuracy gains compared to state-of-the-art.