Automatic text processing
Ray shooting and parametric search
STOC '92 Proceedings of the twenty-fourth annual ACM symposium on Theory of computing
Point location in arrangements of hyperplanes
Information and Computation
STOC '96 Proceedings of the twenty-eighth annual ACM symposium on Theory of computing
Two algorithms for nearest-neighbor search in high dimensions
STOC '97 Proceedings of the twenty-ninth annual ACM symposium on Theory of computing
Locality-preserving hashing in multidimensional spaces
STOC '97 Proceedings of the twenty-ninth annual ACM symposium on Theory of computing
Fuzzy queries in multimedia database systems
PODS '98 Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Approximate nearest neighbors: towards removing the curse of dimensionality
STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Efficient search for approximate nearest neighbor in high dimensional spaces
STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Lower bounds for high dimensional nearest neighbor search and related problems
STOC '99 Proceedings of the thirty-first annual ACM symposium on Theory of computing
An optimal algorithm for approximate nearest neighbor searching
SODA '94 Proceedings of the fifth annual ACM-SIAM symposium on Discrete algorithms
Approximate nearest neighbor algorithms for Frechet distance via product metrics
Proceedings of the eighteenth annual symposium on Computational geometry
Information Retrieval
Cell-probe lower bounds for the partial match problem
Proceedings of the thirty-fifth annual ACM symposium on Theory of computing
A Replacement for Voronoi Diagrams of Near Linear Size
FOCS '01 Proceedings of the 42nd IEEE symposium on Foundations of Computer Science
Locality-sensitive hashing scheme based on p-stable distributions
SCG '04 Proceedings of the twentieth annual symposium on Computational geometry
Lower bounds on locality sensitive hashing
Proceedings of the twenty-second annual symposium on Computational geometry
Efficient filtering with sketches in the ferret toolkit
MIR '06 Proceedings of the 8th ACM international workshop on Multimedia information retrieval
Sizing sketches: a rank-based analysis for similarity search
Proceedings of the 2007 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Multi-probe LSH: efficient indexing for high-dimensional similarity search
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions
Communications of the ACM - 50th anniversary issue: 1958 - 2008
Earth mover distance over high-dimensional spaces
Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms
The power of two min-hashes for similarity search among hierarchical data objects
Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Locality sensitive hash functions based on concomitant rank order statistics
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Modeling LSH for performance tuning
Proceedings of the 17th ACM conference on Information and knowledge management
A posteriori multi-probe locality sensitive hashing
MM '08 Proceedings of the 16th ACM international conference on Multimedia
Distributed similarity search in high dimensions using locality sensitive hashing
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Quality and efficiency in high dimensional nearest neighbor search
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Space-time tradeoffs for approximate nearest neighbor searching
Journal of the ACM (JACM)
An improved algorithm finding nearest neighbor using Kd-trees
LATIN'08 Proceedings of the 8th Latin American conference on Theoretical informatics
Efficient and accurate nearest neighbor and closest pair search in high-dimensional space
ACM Transactions on Database Systems (TODS)
Similarity search and locality sensitive hashing using ternary content addressable memories
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
A locality-sensitive hash for real vectors
SODA '10 Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms
Efficient incremental near duplicate detection based on locality sensitive hashing
DEXA'10 Proceedings of the 21st international conference on Database and expert systems applications: Part I
Fast locality-sensitive hashing
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
SIMP: accurate and efficient near neighbor search in high dimensional spaces
Proceedings of the 15th International Conference on Extending Database Technology
Efficient distributed locality sensitive hashing
Proceedings of the 21st ACM international conference on Information and knowledge management
DLPR: a distributed locality preserving dimension reduction algorithm
IDCS'12 Proceedings of the 5th international conference on Internet and Distributed Computing Systems
Least square regularized spectral hashing for similarity search
Signal Processing
Optimal Lower Bounds for Locality-Sensitive Hashing (Except When q is Tiny)
ACM Transactions on Computation Theory (TOCT)
Efficient binary code indexing with pivot based locality sensitive clustering
Multimedia Tools and Applications
Hi-index | 0.00 |
In this paper we study the problem of finding the approximate nearest neighbor of a query point in the high dimensional space, focusing on the Euclidean space. The earlier approaches use locality-preserving hash functions (that tend to map nearby points to the same value) to construct several hash tables to ensure that the query point hashes to the same bucket as its nearest neighbor in at least one table. Our approach is different - we use one (or a few) hash table and hash several randomly chosen points in the neighborhood of the query point showing that at least one of them will hash to the bucket containing its nearest neighbor. We show that the number of randomly chosen points in the neighborhood of the query point q required depends on the entropy of the hash value h(p) of a random point p at the same distance from q at its nearest neighbor, given q and the locality preserving hash function h chosen randomly from the hash family. Precisely, we show that if the entropy I(h(p)|q, h) = M and g is a bound on the probability that two far-off points will hash to the same bucket, then we can find the approximate nearest neighbor in O(np) time and near linear Õ(n) space where p = M/log(1/g). Alternatively we can build a data structure of size Õ(n1/(1-p)) to answer queries in Õ(d) time. By applying this analysis to the locality preserving hash functions in [17, 21, 6] and adjusting the parameters we show that the c nearest neighbor can be computed in time Õ(np) and near linear space where p ≈ 2.06/c as c becomes large.