Efficient k-nearest neighbor graph construction for generic similarity measures

Authors:
Wei Dong;Charikar Moses;Kai Li
Affiliations:
Princeton University, Princeton, NJ, USA;Princeton University, Princeton, NJ, USA;Princeton University, Princeton, NJ, USA
Venue:
Proceedings of the 20th international conference on World wide web
Year:
2011

Citing 20
Cited 4

The small-world phenomenon: an algorithmic perspective

STOC '00 Proceedings of the thirty-second annual ACM symposium on Theory of computing
Similarity estimation techniques from rounding algorithms

STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
When Is ''Nearest Neighbor'' Meaningful?

ICDT '99 Proceedings of the 7th International Conference on Database Theory
Similarity Search in High Dimensions via Hashing

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
The Anchors Hierarchy: Using the Triangle Inequality to Survive High Dimensional Data

UAI '00 Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence
A Metric for Distributions with Applications to Image Databases

ICCV '98 Proceedings of the Sixth International Conference on Computer Vision
Locality-sensitive hashing scheme based on p-stable distributions

SCG '04 Proceedings of the twentieth annual symposium on Computational geometry
Image similarity search with compact data structures

Proceedings of the thirteenth ACM international conference on Information and knowledge management
Toward the Next Generation of Recommender Systems: A Survey of the State-of-the-Art and Possible Extensions

IEEE Transactions on Knowledge and Data Engineering
Cover trees for nearest neighbor

ICML '06 Proceedings of the 23rd international conference on Machine learning
Graph Embedding and Extensions: A General Framework for Dimensionality Reduction

IEEE Transactions on Pattern Analysis and Machine Intelligence
Scaling up all pairs similarity search

Proceedings of the 16th international conference on World Wide Web
Multi-probe LSH: efficient indexing for high-dimensional similarity search

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
MapReduce: simplified data processing on large clusters

Communications of the ACM - 50th anniversary issue: 1958 - 2008
Efficient similarity joins for near duplicate detection

Proceedings of the 17th international conference on World Wide Web
Modeling LSH for performance tuning

Proceedings of the 17th ACM conference on Information and knowledge management
Fast Approximate kNN Graph Construction for High Dimensional Data via Recursive Lanczos Bisection

The Journal of Machine Learning Research
Efficient parallel set-similarity joins using MapReduce

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Fast Construction of k-Nearest Neighbor Graphs for Point Clouds

IEEE Transactions on Visualization and Computer Graphics
Practical construction of k-nearest neighbor graphs in metric spaces

WEA'06 Proceedings of the 5th international conference on Experimental Algorithms

Distributed KNN-graph approximation via hashing

Proceedings of the 2nd ACM International Conference on Multimedia Retrieval
Efficient mining of repetitions in large-scale TV streams with product quantization hashing

ECCV'12 Proceedings of the 12th international conference on Computer Vision - Volume Part I
Scalable k-nearest neighbor graph construction based on greedy filtering

Proceedings of the 22nd international conference on World Wide Web companion
Fast image/video collection summarization with local clustering

Proceedings of the 21st ACM international conference on Multimedia

Quantified Score

Hi-index	0.00

Visualization

Abstract

K-Nearest Neighbor Graph (K-NNG) construction is an important operation with many web related applications, including collaborative filtering, similarity search, and many others in data mining and machine learning. Existing methods for K-NNG construction either do not scale, or are specific to certain similarity measures. We present NN-Descent, a simple yet efficient algorithm for approximate K-NNG construction with arbitrary similarity measures. Our method is based on local search, has minimal space overhead and does not rely on any shared global index. Hence, it is especially suitable for large-scale applications where data structures need to be distributed over the network. We have shown with a variety of datasets and similarity measures that the proposed method typically converges to above 90% recall with each point comparing only to several percent of the whole dataset on average.