Sphere-packings, lattices, and groups
Sphere-packings, lattices, and groups
Min-wise independent permutations (extended abstract)
STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Database-friendly random projections
PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Closest-point problems simplified on the RAM
SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Similarity Search in High Dimensions via Hashing
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Locality-sensitive hashing scheme based on p-stable distributions
SCG '04 Proceedings of the twentieth annual symposium on Computational geometry
Foundations of Multidimensional and Metric Data Structures (The Morgan Kaufmann Series in Computer Graphics and Geometric Modeling)
Entropy based nearest neighbor search in high dimensions
SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Detecting near-duplicates for web crawling
Proceedings of the 16th international conference on World Wide Web
Uncertainty principles, extractors, and explicit embeddings of l2 into l1
Proceedings of the thirty-ninth annual ACM symposium on Theory of computing
Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions
Communications of the ACM - 50th anniversary issue: 1958 - 2008
Almost Euclidean subspaces of ℓN1 via expander codes
Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms
Efficient SINR queries for CSMA/CA simulation
Proceedings of the 13th ACM international conference on Modeling, analysis, and simulation of wireless and mobile systems
Optimal Lower Bounds for Locality-Sensitive Hashing (Except When q is Tiny)
ACM Transactions on Computation Theory (TOCT)
Hi-index | 0.00 |
We present a simple and practical algorithm for the c--approximate near neighbor problem (c--NN): given n points P ⊂ Rd and radius R, build a data structure which, given q ∈ Rd, can with probability 1 -- δ return a point p ε P with dist(p, q) ≤ cR if there is any p* ε P with dist(p*, q) ≤ R. For c = d + 1, our algorithm deterministically (δ = 0) preprocesses in time O(nd log d), space O(dn), and answers queries in expected time O(d2); this is the first known algorithm to deterministically guarantee an O(d)---NN solution in constant time with respect to n for all lp metrics. A probabilistic version empirically achieves useful c values (c c appears to grow minimally as d → ∞. A query time of O(d log d) is available, providing slightly less accuracy. These techniques can also be used to approximately find (pointers between) all pairs x, y ε P with dist(x, y) ≤ R in time O(nd log d). The key to the algorithm is a locality-sensitive hash: a mapping h: Rd → U with the property that h(x) = h(y) is much more likely for nearby x, y. We introduce a somewhat regular simplex which tessellates Rd, and efficiently hash each point in any simplex of this tessellation to all d + 1 corners; any points in neighboring cells will be hashed to a shared corner and noticed as nearby points. This method is completely independent of dimension reduction, so that additional space and time savings are available by first reducing all input vectors.