Fast locality-sensitive hashing

  • Authors:
  • Anirban Dasgupta;Ravi Kumar;Tamas Sarlos

  • Affiliations:
  • Yahoo!, Sunnyvale, CA, USA;Yahoo!, Sunnyvale, CA, USA;Yahoo!, Sunnyvale, CA, USA

  • Venue:
  • Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Locality-sensitive hashing (LSH) is a basic primitive in several large-scale data processing applications, including nearest-neighbor search, de-duplication, clustering, etc. In this paper we propose a new and simple method to speed up the widely-used Euclidean realization of LSH. At the heart of our method is a fast way to estimate the Euclidean distance between two d-dimensional vectors; this is achieved by the use of randomized Hadamard transforms in a non-linear setting. This decreases the running time of a (k, L)-parameterized LSH from O(dkL) to O(dlog d + kL). Our experiments show that using the new LSH in nearest-neighbor applications can improve their running times by significant amounts. To the best of our knowledge, this is the first running time improvement to LSH that is both provable and practical.