Approximate nearest neighbors: towards removing the curse of dimensionality
STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Efficient search for approximate nearest neighbor in high dimensional spaces
STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Similarity estimation techniques from rounding algorithms
STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Similarity Search in High Dimensions via Hashing
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Navigating nets: simple algorithms for proximity search
SODA '04 Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms
Locality-sensitive hashing scheme based on p-stable distributions
SCG '04 Proceedings of the twentieth annual symposium on Computational geometry
Entropy based nearest neighbor search in high dimensions
SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Cover trees for nearest neighbor
ICML '06 Proceedings of the 23rd international conference on Machine learning
Google news personalization: scalable online collaborative filtering
Proceedings of the 16th international conference on World Wide Web
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Multi-probe LSH: efficient indexing for high-dimensional similarity search
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Distributed similarity search in high dimensions using locality sensitive hashing
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
A platform for scalable one-pass analytics using MapReduce
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Bayesian locality sensitive hashing for fast similarity search
Proceedings of the VLDB Endowment
Streaming similarity search over one billion tweets using parallel locality-sensitive hashing
Proceedings of the VLDB Endowment
Hi-index | 0.00 |
Distributed frameworks are gaining increasingly widespread use in applications that process large amounts of data. One important example application is large scale similarity search, for which Locality Sensitive Hashing (LSH) has emerged as the method of choice, specially when the data is high-dimensional. To guarantee high search quality, the LSH scheme needs a rather large number of hash tables. This entails a large space requirement, and in the distributed setting, with each query requiring a network call per hash bucket look up, also a big network load. Panigrahy's Entropy LSH scheme significantly reduces the space requirement but does not help with (and in fact worsens) the search network efficiency. In this paper, focusing on the Euclidian space under ι2 norm and building up on Entropy LSH, we propose the distributed Layered LSH scheme, and prove that it exponentially decreases the network cost, while maintaining a good load balance between different machines. Our experiments also verify that our theoretical results.