Optimal Lower Bounds for Locality-Sensitive Hashing (Except When q is Tiny)

Authors:
Ryan O’Donnell;Yi Wu;Yuan Zhou
Affiliations:
Carnegie Mellon University;Carnegie Mellon University;Carnegie Mellon University
Venue:
ACM Transactions on Computation Theory (TOCT)
Year:
2014

Citing 16
Cited 0

Approximate nearest neighbors: towards removing the curse of dimensionality

STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Finding Interesting Associations without Support Pruning

IEEE Transactions on Knowledge and Data Engineering
Similarity Search in High Dimensions via Hashing

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
High-dimensional computational geometry

High-dimensional computational geometry
Fast Pose Estimation with Parameter-Sensitive Hashing

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Computational applications of noise sensitivity

Computational applications of noise sensitivity
Locality-sensitive hashing scheme based on p-stable distributions

SCG '04 Proceedings of the twentieth annual symposium on Computational geometry
Entropy based nearest neighbor search in high dimensions

SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Efficient algorithms for substring near neighbor problem

SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Randomized algorithms and NLP: using locality sensitive hash function for high speed noun clustering

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Google news personalization: scalable online collaborative filtering

Proceedings of the 16th international conference on World Wide Web
Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions

Communications of the ACM - 50th anniversary issue: 1958 - 2008
Lower Bounds on Locality Sensitive Hashing

SIAM Journal on Discrete Mathematics
A Geometric Approach to Lower Bounds for Approximate Near-Neighbor Search and Partial Match

FOCS '08 Proceedings of the 2008 49th Annual IEEE Symposium on Foundations of Computer Science
A locality-sensitive hash for real vectors

SODA '10 Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms
Spherical lsh for approximate nearest neighbor search on unit hypersphere

WADS'07 Proceedings of the 10th international conference on Algorithms and Data Structures

Quantified Score

Hi-index	0.00

Visualization

Abstract

We study lower bounds for Locality-Sensitive Hashing (LSH) in the strongest setting: point sets in {0,1}d under the Hamming distance. Recall that H is said to be an (r, cr, p, q)-sensitive hash family if all pairs x, y ∈ {0,1}d with dist(x, y) ≤ r have probability at least p of collision under a randomly chosen h ∈ H, whereas all pairs x, y ∈ {0, 1}d with dist(x, y) ≥ cr have probability at most q of collision. Typically, one considers d → ∞, with c 1 fixed and q bounded away from 0. For its applications to approximate nearest-neighbor search in high dimensions, the quality of an LSH family H is governed by how small its ρ parameter ρ = ln(1/p)/ln(1/q) is as a function of the parameter c. The seminal paper of Indyk and Motwani [1998] showed that for each c ≥ 1, the extremely simple family H = {x ↦ xi : i ∈ [d]} achieves ρ ≤ 1/c. The only known lower bound, due to Motwani et al. [2007], is that ρ must be at least ( e1/c - 1)/(e1/c + 1) ≥ .46/c (minus od(1)). The contribution of this article is twofold. (1) We show the “optimal” lower bound for ρ: it must be at least 1/c (minus od(1)). Our proof is very simple, following almost immediately from the observation that the noise stability of a boolean function at time t is a log-convex function of t. (2) We raise and discuss the following issue: neither the application of LSH to nearest-neighbor search nor the known LSH lower bounds hold as stated if the q parameter is tiny. Here, “tiny” means q = 2-Θ(d), a parameter range we believe is natural.