Entropy based nearest neighbor search in high dimensions

Authors:
Rina Panigrahy
Affiliations:
Stanford University, Stanford, CA
Venue:
SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Year:
2006

Citing 17
Cited 28

Automatic text processing

Automatic text processing
Ray shooting and parametric search

STOC '92 Proceedings of the twenty-fourth annual ACM symposium on Theory of computing
Point location in arrangements of hyperplanes

Information and Computation
Non-expansive hashing

STOC '96 Proceedings of the twenty-eighth annual ACM symposium on Theory of computing
Two algorithms for nearest-neighbor search in high dimensions

STOC '97 Proceedings of the twenty-ninth annual ACM symposium on Theory of computing
Locality-preserving hashing in multidimensional spaces

STOC '97 Proceedings of the twenty-ninth annual ACM symposium on Theory of computing
Fuzzy queries in multimedia database systems

PODS '98 Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Approximate nearest neighbors: towards removing the curse of dimensionality

STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Efficient search for approximate nearest neighbor in high dimensional spaces

STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Lower bounds for high dimensional nearest neighbor search and related problems

STOC '99 Proceedings of the thirty-first annual ACM symposium on Theory of computing
An optimal algorithm for approximate nearest neighbor searching

SODA '94 Proceedings of the fifth annual ACM-SIAM symposium on Discrete algorithms
Approximate nearest neighbor algorithms for Frechet distance via product metrics

Proceedings of the eighteenth annual symposium on Computational geometry
Information Retrieval

Information Retrieval
Query by Image and Video Content: The QBIC System

Computer
Cell-probe lower bounds for the partial match problem

Proceedings of the thirty-fifth annual ACM symposium on Theory of computing
A Replacement for Voronoi Diagrams of Near Linear Size

FOCS '01 Proceedings of the 42nd IEEE symposium on Foundations of Computer Science
Locality-sensitive hashing scheme based on p-stable distributions

SCG '04 Proceedings of the twentieth annual symposium on Computational geometry

Lower bounds on locality sensitive hashing

Proceedings of the twenty-second annual symposium on Computational geometry
Efficient filtering with sketches in the ferret toolkit

MIR '06 Proceedings of the 8th ACM international workshop on Multimedia information retrieval
Sizing sketches: a rank-based analysis for similarity search

Proceedings of the 2007 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Multi-probe LSH: efficient indexing for high-dimensional similarity search

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions

Communications of the ACM - 50th anniversary issue: 1958 - 2008
Earth mover distance over high-dimensional spaces

Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms
The power of two min-hashes for similarity search among hierarchical data objects

Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Locality sensitive hash functions based on concomitant rank order statistics

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Modeling LSH for performance tuning

Proceedings of the 17th ACM conference on Information and knowledge management
A posteriori multi-probe locality sensitive hashing

MM '08 Proceedings of the 16th ACM international conference on Multimedia
Distributed similarity search in high dimensions using locality sensitive hashing

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Quality and efficiency in high dimensional nearest neighbor search

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Space-time tradeoffs for approximate nearest neighbor searching

Journal of the ACM (JACM)
An improved algorithm finding nearest neighbor using Kd-trees

LATIN'08 Proceedings of the 8th Latin American conference on Theoretical informatics
Efficient and accurate nearest neighbor and closest pair search in high-dimensional space

ACM Transactions on Database Systems (TODS)
Similarity search and locality sensitive hashing using ternary content addressable memories

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
A locality-sensitive hash for real vectors

SODA '10 Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms
Efficient incremental near duplicate detection based on locality sensitive hashing

DEXA'10 Proceedings of the 21st international conference on Database and expert systems applications: Part I
Fast locality-sensitive hashing

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Approximate all nearest neighbor search for high dimensional entropy estimation for image registration

Signal Processing
SIMP: accurate and efficient near neighbor search in high dimensional spaces

Proceedings of the 15th International Conference on Extending Database Technology
Efficient distributed locality sensitive hashing

Proceedings of the 21st ACM international conference on Information and knowledge management
DLPR: a distributed locality preserving dimension reduction algorithm

IDCS'12 Proceedings of the 5th international conference on Internet and Distributed Computing Systems
Nonnegative sparse coding induced hashing for image copy detection

Neurocomputing
An improved method of locality sensitive hashing for indexing large-scale and high-dimensional features

Signal Processing
Least square regularized spectral hashing for similarity search

Signal Processing
Optimal Lower Bounds for Locality-Sensitive Hashing (Except When q is Tiny)

ACM Transactions on Computation Theory (TOCT)
Efficient binary code indexing with pivot based locality sensitive clustering

Multimedia Tools and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we study the problem of finding the approximate nearest neighbor of a query point in the high dimensional space, focusing on the Euclidean space. The earlier approaches use locality-preserving hash functions (that tend to map nearby points to the same value) to construct several hash tables to ensure that the query point hashes to the same bucket as its nearest neighbor in at least one table. Our approach is different - we use one (or a few) hash table and hash several randomly chosen points in the neighborhood of the query point showing that at least one of them will hash to the bucket containing its nearest neighbor. We show that the number of randomly chosen points in the neighborhood of the query point q required depends on the entropy of the hash value h(p) of a random point p at the same distance from q at its nearest neighbor, given q and the locality preserving hash function h chosen randomly from the hash family. Precisely, we show that if the entropy I(h(p)|q, h) = M and g is a bound on the probability that two far-off points will hash to the same bucket, then we can find the approximate nearest neighbor in O(np) time and near linear Õ(n) space where p = M/log(1/g). Alternatively we can build a data structure of size Õ(n1/(1-p)) to answer queries in Õ(d) time. By applying this analysis to the locality preserving hash functions in [17, 21, 6] and adjusting the parameters we show that the c nearest neighbor can be computed in time Õ(np) and near linear space where p ≈ 2.06/c as c becomes large.