K-d trees for semidynamic point sets
SCG '90 Proceedings of the sixth annual symposium on Computational geometry
Min-wise independent permutations (extended abstract)
STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Approximate nearest neighbors: towards removing the curse of dimensionality
STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
An optimal algorithm for approximate nearest neighbor searching
SODA '94 Proceedings of the fifth annual ACM-SIAM symposium on Discrete algorithms
Similarity estimation techniques from rounding algorithms
STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
R-trees: a dynamic index structure for spatial searching
SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Similarity Search in High Dimensions via Hashing
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Efficient similarity search and classification via rank aggregation
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Locality-sensitive hashing scheme based on p-stable distributions
SCG '04 Proceedings of the twentieth annual symposium on Computational geometry
An efficient parts-based near-duplicate and sub-image retrieval system
Proceedings of the 12th annual ACM international conference on Multimedia
LSH forest: self-tuning indexes for similarity search
WWW '05 Proceedings of the 14th international conference on World Wide Web
Approximate nearest neighbors and the fast Johnson-Lindenstrauss transform
Proceedings of the thirty-eighth annual ACM symposium on Theory of computing
Multi-probe LSH: efficient indexing for high-dimensional similarity search
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions
Communications of the ACM - 50th anniversary issue: 1958 - 2008
Quality and efficiency in high dimensional nearest neighbor search
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Locality-sensitive hashing scheme based on dynamic collision counting
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Hi-index | 0.00 |
Locality Sensitive Hashing (LSH) is widely recognized as one of the most promising approaches to similarity search in high-dimensional spaces. Based on LSH, a considerable number of nearest neighbor search algorithms have been proposed in the past, with some of them having been used in many real-life applications. Apart from their demonstrated superior performance in practice, the popularity of the LSH algorithms is mainly due to their provable performance bounds on query cost, space consumption and failure probability. In this paper, we show that a surprising gap exists between the LSH theory and widely practiced algorithm analysis techniques. In particular, we discover that a critical assumption made in the classical LSH algorithm analysis does not hold in practice, which suggests that using the existing methods to analyze the performance of practical LSH algorithms is a conceptual mismatch. To address this problem, a novel analysis model is developed that bridges the gap between the LSH theory and the method for analyzing the LSH algorithm performance. With the help of this model, we identify some important flaws in the commonly used analysis methods in the LSH literature. The validity of this model is verified through extensive experiments with real datasets.