Lower bounds on locality sensitive hashing

Authors:
Rajeev Motwani;Assaf Naor;Rina Panigrahi
Affiliations:
Stanford University, Stanford, CA;Microsoft Research, Redmond, WA;Stanford University, Stanford, CA
Venue:
Proceedings of the twenty-second annual symposium on Computational geometry
Year:
2006

Citing 4
Cited 8

Approximate nearest neighbors: towards removing the curse of dimensionality

STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
A Replacement for Voronoi Diagrams of Near Linear Size

FOCS '01 Proceedings of the 42nd IEEE symposium on Foundations of Computer Science
Locality-sensitive hashing scheme based on p-stable distributions

SCG '04 Proceedings of the twentieth annual symposium on Computational geometry
Entropy based nearest neighbor search in high dimensions

SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm

Multi-probe LSH: efficient indexing for high-dimensional similarity search

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions

Communications of the ACM - 50th anniversary issue: 1958 - 2008
An improved algorithm finding nearest neighbor using Kd-trees

LATIN'08 Proceedings of the 8th Latin American conference on Theoretical informatics
Similarity search and locality sensitive hashing using ternary content addressable memories

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Bucketing coding and information theory for the statistical high-dimensional nearest-neighbor problem

IEEE Transactions on Information Theory
Approximate nearest neighbor search for low dimensional queries

Proceedings of the twenty-second annual ACM-SIAM symposium on Discrete Algorithms
SIMP: accurate and efficient near neighbor search in high dimensional spaces

Proceedings of the 15th International Conference on Extending Database Technology
Distributed approximate spectral clustering for large-scale datasets

Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing

Quantified Score

Hi-index	0.06

Visualization

Abstract

Given a metric space (X,dX), c≥1, r0, and p,q ≡ [0,1], a distribution over mappings H : X → N is called a (r,cr,p,q)-sensitive hash family if any two points in X at distance at most r are mapped by H to the same value with probability at least p, and any two points at distance greater than cr are mapped by H to the same value with probability at most q. This notion was introduced by Indyk and Motwani in 1998 as the basis for an efficient approximate nearest neighbor search algorithm, and has since been used extensively for this purpose. The performance of these algorithms is governed by the parameter ⊇=log(1/p)/log(1/q), and constructing hash families with small ⊇ automatically yields improved nearest neighbor algorithms. Here we show that for X=l1 it is impossible to achieve ⊇ ≤ 1/2c. This almost matches the construction of Indyk and Motwani which achieves ⊇ ≤ 1/c.