Two algorithms for nearest-neighbor search in high dimensions
STOC '97 Proceedings of the twenty-ninth annual ACM symposium on Theory of computing
Approximate nearest neighbors: towards removing the curse of dimensionality
STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Efficient search for approximate nearest neighbor in high dimensional spaces
STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Data structures and algorithms for nearest neighbor search in general metric spaces
SODA '93 Proceedings of the fourth annual ACM-SIAM Symposium on Discrete algorithms
Locally lifting the curse of dimensionality for nearest neighbor search (extended abstract)
SODA '00 Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms
Multidimensional divide-and-conquer
Communications of the ACM
ACM Computing Surveys (CSUR)
The bit vector intersection problem
FOCS '95 Proceedings of the 36th Annual Symposium on Foundations of Computer Science
A Review of Audio Fingerprinting
Journal of VLSI Signal Processing Systems
Inverted files for text search engines
ACM Computing Surveys (CSUR)
Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions
Communications of the ACM - 50th anniversary issue: 1958 - 2008
Introduction to Information Retrieval
Introduction to Information Retrieval
Hi-index | 0.00 |
Hash tables have been proposed for the indexing of high-dimensional binary vectors, specifically for the identification of media by fingerprints. In this paper we develop a new model to predict the performance of a hash-based method (Fingerprint Hashing) under varying levels of noise. We show that by the adjustment of two parameters, robustness to a higher level of noise is achieved. We extend Fingerprint Hashing to a multi-table range search (Extended Fingerprint Hashing) and show this approach also increases robustness to noise. We then show the relationship between Extended Fingerprint Hashing and Locality Sensitive Hashing and investigate design choices for dealing with higher noise levels. If index size must be held constant, the Extended Fingerprint Hash is a superior method. We also show that to achieve similar performance at a given level of noise a Locality Sensitive Hash requires nearly a six-fold increase in index size which is likely to be impractical for many applications.