An Efficient Similarity Searching Scheme in Massive Databases

Authors:
Haiying Shen;Ting Li;Tom Schweiger
Affiliations:
-;-;-
Venue:
ICDT '08 Proceedings of the 2008 The Third International Conference on Digital Telecommunications
Year:
2008

Citing 0
Cited 1

Data management of wireless sensor networks

CCNC'09 Proceedings of the 6th IEEE Conference on Consumer Communications and Networking Conference

Quantified Score

Hi-index	0.01

Visualization

Abstract

Locality Sensitive Hashing (LSH) is a method of performing probabilistic dimension reduction of high dimensional data. It is a popular technique for approximate nearest neighbor search. However, LSH needs large memory space and long processing time to achieve good performance when searching a massive dataset. In addition, it is not effective on locating similar data in a very high dimensional dataset. This paper proposes a new LSH-based similarity searching scheme, namely SMLSH. It intelligently combines a consistent hash function and min-wise independent permutations into LSH. SMLSH effectively classifies information according to the similarity with reduced memory space requirement and in a very efficient manner. It can quickly locate similar data in a massive dataset. Experiment results show that SMLSH is both time and space efficient in comparison with LSH. It yields significant improvements on the effectiveness of similar searching over LSH in a massive dataset.