The SR-tree: an index structure for high-dimensional nearest neighbor queries
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Similarity estimation techniques from rounding algorithms
STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
R-trees: a dynamic index structure for spatial searching
SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
M-tree: An Efficient Access Method for Similarity Search in Metric Spaces
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Similarity Search in High Dimensions via Hashing
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Approximate similarity retrieval with M-trees
The VLDB Journal — The International Journal on Very Large Data Bases
Locality-sensitive hashing scheme based on p-stable distributions
SCG '04 Proceedings of the twentieth annual symposium on Computational geometry
An efficient parts-based near-duplicate and sub-image retrieval system
Proceedings of the 12th annual ACM international conference on Multimedia
IEEE Transactions on Knowledge and Data Engineering
A web-based kernel function for measuring the similarity of short text snippets
Proceedings of the 15th international conference on World Wide Web
Rapid Object Indexing Using Locality Sensitive Hashing and Joint 3D-Signature Space Estimation
IEEE Transactions on Pattern Analysis and Machine Intelligence
Cover trees for nearest neighbor
ICML '06 Proceedings of the 23rd international conference on Machine learning
Graph Embedding and Extensions: A General Framework for Dimensionality Reduction
IEEE Transactions on Pattern Analysis and Machine Intelligence
Nearest-Neighbor Methods in Learning and Vision: Theory and Practice (Neural Information Processing)
Nearest-Neighbor Methods in Learning and Vision: Theory and Practice (Neural Information Processing)
Scaling up all pairs similarity search
Proceedings of the 16th international conference on World Wide Web
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search
ECCV '08 Proceedings of the 10th European Conference on Computer Vision: Part I
Object Mining Using a Matching Graph on Very Large Image Collections
ICVGIP '08 Proceedings of the 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing
Locality sensitive hashing: A comparison of hash function types and querying mechanisms
Pattern Recognition Letters
Proceedings of the 6th International Conference on Semantic Systems
Efficient k-nearest neighbor graph construction for generic similarity measures
Proceedings of the 20th international conference on World wide web
ATLAS: a probabilistic algorithm for high dimensional similarity search
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Large scale visual-based event matching
Proceedings of the 1st ACM International Conference on Multimedia Retrieval
Replication, load balancing and efficient range query processing in DHTs
EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology
CVPR '11 Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition
Hi-index | 0.00 |
Efficiently constructing the K-Nearest Neighbor Graph (K-NNG) of large and high dimensional datasets is crucial for many applications with feature-rich objects, such as images or other multimedia content. In this paper we investigate the use of high dimensional hashing methods for efficiently approximating the K-NNG, notably in distributed environments. We first discuss the importance of balancing issues on the performance of such approaches and show why the baseline approach using Locality Sensitive Hashing does not perform well. Our new KNN-join method is based on RMMH, a recently introduced hash function family based on randomly trained classifiers. We show that the resulting hash tables are much more balanced and that the number of resulting collisions can be greatly reduced without degrading quality. We further improve the load balancing of our distributed approach by designing a parallelized local join algorithm, implemented within the MapReduce framework.