An Efficient Similarity Searching Scheme in Massive Databases

  • Authors:
  • Haiying Shen;Ting Li;Tom Schweiger

  • Affiliations:
  • -;-;-

  • Venue:
  • ICDT '08 Proceedings of the 2008 The Third International Conference on Digital Telecommunications
  • Year:
  • 2008

Quantified Score

Hi-index 0.01

Visualization

Abstract

Locality Sensitive Hashing (LSH) is a method of performing probabilistic dimension reduction of high dimensional data. It is a popular technique for approximate nearest neighbor search. However, LSH needs large memory space and long processing time to achieve good performance when searching a massive dataset. In addition, it is not effective on locating similar data in a very high dimensional dataset. This paper proposes a new LSH-based similarity searching scheme, namely SMLSH. It intelligently combines a consistent hash function and min-wise independent permutations into LSH. SMLSH effectively classifies information according to the similarity with reduced memory space requirement and in a very efficient manner. It can quickly locate similar data in a massive dataset. Experiment results show that SMLSH is both time and space efficient in comparison with LSH. It yields significant improvements on the effectiveness of similar searching over LSH in a massive dataset.