Approximate nearest neighbors: towards removing the curse of dimensionality
STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Syntactic clustering of the Web
Selected papers from the sixth international conference on World Wide Web
Similarity estimation techniques from rounding algorithms
STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
A large-scale study of the evolution of web pages
WWW '03 Proceedings of the 12th international conference on World Wide Web
On the Resemblance and Containment of Documents
SEQUENCES '97 Proceedings of the Compression and Complexity of Sequences 1997
Detecting near-duplicates for web crawling
Proceedings of the 16th international conference on World Wide Web
An Algorithm for Finding Nearest Neighbors
IEEE Transactions on Computers
Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions
Communications of the ACM - 50th anniversary issue: 1958 - 2008
International Journal of Approximate Reasoning
Proceedings of the 19th international conference on World wide web
Improving random projections using marginal information
COLT'06 Proceedings of the 19th annual conference on Learning Theory
Mining of Massive Datasets
b-bit minwise hashing in practice
Proceedings of the 5th Asia-Pacific Symposium on Internetware
Hi-index | 0.00 |
Numerous applications in search, databases, machine learning, and computer vision, can benefit from efficient algorithms for near neighbor search. This paper proposes a simple framework for fast near neighbor search in high-dimensional binary data, which are common in practice (e.g., text). We develop a very simple and effective strategy for sub-linear time near neighbor search, by creating hash tables directly using the bits generated by b-bit minwise hashing. The advantages of our method are demonstrated through thorough comparisons with two strong baselines: spectral hashing and sign (1-bit) random projections.