Locality-preserving hashing in multidimensional spaces
STOC '97 Proceedings of the twenty-ninth annual ACM symposium on Theory of computing
Approximate nearest neighbors: towards removing the curse of dimensionality
STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Syntactic clustering of the Web
Selected papers from the sixth international conference on World Wide Web
Min-wise independent permutations
Journal of Computer and System Sciences - 30th annual ACM symposium on theory of computing
Similarity estimation techniques from rounding algorithms
STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
On the Resemblance and Containment of Documents
SEQUENCES '97 Proceedings of the Compression and Complexity of Sequences 1997
Locality-sensitive hashing scheme based on p-stable distributions
SCG '04 Proceedings of the twentieth annual symposium on Computational geometry
Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions
Communications of the ACM - 50th anniversary issue: 1958 - 2008
Geometry of Cuts and Metrics
A model for spectra-based software diagnosis
ACM Transactions on Software Engineering and Methodology (TOSEM)
Hi-index | 0.00 |
Locality sensitive hashing (LSH) is a key algorithmic tool that is widely used both in theory and practice. An important goal in the study of LSH is to understand which similarity functions admit an LSH, i.e., are LSHable. In this paper we focus on the class of transformations such that given any similarity that is LSHable, the transformed similarity will continue to be LSHable. We show a tight characterization of all such LSH-preserving transformations: they are precisely the probability generating functions, up to scaling. As a concrete application of this result, we study which set similarity measures are LSHable. We obtain a complete characterization of similarity measures between two sets A and B that are ratios of two linear functions of |A ∩ B|, |A Δ B|, |A ∪ B|: such a measure is LSHable if and only if its corresponding distance is a metric. This result generalizes the well-known LSH for the Jaccard set similarity, namely, the minwise-independent permutations, and obtains LSHs for many set similarity measures that are used in practice. Using our main result, we obtain a similar characterization for set similarities involving radicals.