Efficient distributed locality sensitive hashing

  • Authors:
  • Bahman Bahmani;Ashish Goel;Rajendra Shinde

  • Affiliations:
  • Stanford University, Stanford, CA, USA;Stanford University, Stanford, CA, USA;Stanford University, Stanford, CA, USA

  • Venue:
  • Proceedings of the 21st ACM international conference on Information and knowledge management
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Distributed frameworks are gaining increasingly widespread use in applications that process large amounts of data. One important example application is large scale similarity search, for which Locality Sensitive Hashing (LSH) has emerged as the method of choice, specially when the data is high-dimensional. To guarantee high search quality, the LSH scheme needs a rather large number of hash tables. This entails a large space requirement, and in the distributed setting, with each query requiring a network call per hash bucket look up, also a big network load. Panigrahy's Entropy LSH scheme significantly reduces the space requirement but does not help with (and in fact worsens) the search network efficiency. In this paper, focusing on the Euclidian space under ι2 norm and building up on Entropy LSH, we propose the distributed Layered LSH scheme, and prove that it exponentially decreases the network cost, while maintaining a good load balance between different machines. Our experiments also verify that our theoretical results.