A locality-sensitive hash for real vectors

  • Authors:
  • Tyler Neylon

  • Affiliations:
  • -

  • Venue:
  • SODA '10 Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present a simple and practical algorithm for the c--approximate near neighbor problem (c--NN): given n points P ⊂ Rd and radius R, build a data structure which, given q ∈ Rd, can with probability 1 -- δ return a point p ε P with dist(p, q) ≤ cR if there is any p* ε P with dist(p*, q) ≤ R. For c = d + 1, our algorithm deterministically (δ = 0) preprocesses in time O(nd log d), space O(dn), and answers queries in expected time O(d2); this is the first known algorithm to deterministically guarantee an O(d)---NN solution in constant time with respect to n for all lp metrics. A probabilistic version empirically achieves useful c values (c c appears to grow minimally as d → ∞. A query time of O(d log d) is available, providing slightly less accuracy. These techniques can also be used to approximately find (pointers between) all pairs x, y ε P with dist(x, y) ≤ R in time O(nd log d). The key to the algorithm is a locality-sensitive hash: a mapping h: Rd → U with the property that h(x) = h(y) is much more likely for nearby x, y. We introduce a somewhat regular simplex which tessellates Rd, and efficiently hash each point in any simplex of this tessellation to all d + 1 corners; any points in neighboring cells will be hashed to a shared corner and noticed as nearby points. This method is completely independent of dimension reduction, so that additional space and time savings are available by first reducing all input vectors.