The small-world phenomenon: an algorithmic perspective
STOC '00 Proceedings of the thirty-second annual ACM symposium on Theory of computing
Similarity estimation techniques from rounding algorithms
STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
When Is ''Nearest Neighbor'' Meaningful?
ICDT '99 Proceedings of the 7th International Conference on Database Theory
Similarity Search in High Dimensions via Hashing
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
The Anchors Hierarchy: Using the Triangle Inequality to Survive High Dimensional Data
UAI '00 Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence
A Metric for Distributions with Applications to Image Databases
ICCV '98 Proceedings of the Sixth International Conference on Computer Vision
Locality-sensitive hashing scheme based on p-stable distributions
SCG '04 Proceedings of the twentieth annual symposium on Computational geometry
Image similarity search with compact data structures
Proceedings of the thirteenth ACM international conference on Information and knowledge management
IEEE Transactions on Knowledge and Data Engineering
Cover trees for nearest neighbor
ICML '06 Proceedings of the 23rd international conference on Machine learning
Graph Embedding and Extensions: A General Framework for Dimensionality Reduction
IEEE Transactions on Pattern Analysis and Machine Intelligence
Scaling up all pairs similarity search
Proceedings of the 16th international conference on World Wide Web
Multi-probe LSH: efficient indexing for high-dimensional similarity search
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
MapReduce: simplified data processing on large clusters
Communications of the ACM - 50th anniversary issue: 1958 - 2008
Efficient similarity joins for near duplicate detection
Proceedings of the 17th international conference on World Wide Web
Modeling LSH for performance tuning
Proceedings of the 17th ACM conference on Information and knowledge management
Fast Approximate kNN Graph Construction for High Dimensional Data via Recursive Lanczos Bisection
The Journal of Machine Learning Research
Efficient parallel set-similarity joins using MapReduce
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Fast Construction of k-Nearest Neighbor Graphs for Point Clouds
IEEE Transactions on Visualization and Computer Graphics
Practical construction of k-nearest neighbor graphs in metric spaces
WEA'06 Proceedings of the 5th international conference on Experimental Algorithms
Distributed KNN-graph approximation via hashing
Proceedings of the 2nd ACM International Conference on Multimedia Retrieval
Efficient mining of repetitions in large-scale TV streams with product quantization hashing
ECCV'12 Proceedings of the 12th international conference on Computer Vision - Volume Part I
Scalable k-nearest neighbor graph construction based on greedy filtering
Proceedings of the 22nd international conference on World Wide Web companion
Fast image/video collection summarization with local clustering
Proceedings of the 21st ACM international conference on Multimedia
Hi-index | 0.00 |
K-Nearest Neighbor Graph (K-NNG) construction is an important operation with many web related applications, including collaborative filtering, similarity search, and many others in data mining and machine learning. Existing methods for K-NNG construction either do not scale, or are specific to certain similarity measures. We present NN-Descent, a simple yet efficient algorithm for approximate K-NNG construction with arbitrary similarity measures. Our method is based on local search, has minimal space overhead and does not rely on any shared global index. Hence, it is especially suitable for large-scale applications where data structures need to be distributed over the network. We have shown with a variety of datasets and similarity measures that the proposed method typically converges to above 90% recall with each point comparing only to several percent of the whole dataset on average.