An improved method of locality sensitive hashing for indexing large-scale and high-dimensional features

  • Authors:
  • Xiaoguang Gu;Yongdong Zhang;Lei Zhang;Dongming Zhang;Jintao Li

  • Affiliations:
  • Institute of Computing Technology, Chinese Academy of Sciences, No. 6 Kexueyuan South Road, Haidian District, 100190 Beijing, China and Graduate University of Chinese Academy of Sciences, 19A Yuqu ...;Institute of Computing Technology, Chinese Academy of Sciences, No. 6 Kexueyuan South Road, Haidian District, 100190 Beijing, China and Beijing Key Laboratory of Mobile Computing and Pervasive Dev ...;Institute of Computing Technology, Chinese Academy of Sciences, No. 6 Kexueyuan South Road, Haidian District, 100190 Beijing, China and Graduate University of Chinese Academy of Sciences, 19A Yuqu ...;Institute of Computing Technology, Chinese Academy of Sciences, No. 6 Kexueyuan South Road, Haidian District, 100190 Beijing, China and Beijing Key Laboratory of Mobile Computing and Pervasive Dev ...;Institute of Computing Technology, Chinese Academy of Sciences, No. 6 Kexueyuan South Road, Haidian District, 100190 Beijing, China and Beijing Key Laboratory of Mobile Computing and Pervasive Dev ...

  • Venue:
  • Signal Processing
  • Year:
  • 2013

Quantified Score

Hi-index 0.08

Visualization

Abstract

In recent years, Locality sensitive hashing (LSH) has been popularly used as an effective and efficient index structure of multimedia signals. LSH is originally proposed for resolving the high-dimensional approximate similarity search problem. Until now, many kinds of variations of LSH have been proposed for large-scale indexing. Much of the interest is focused on improving the query accuracy for skewed data distribution and reducing the storage space. However, when using LSH, a final filtering process based on exact similarity measure is needed. When the dataset is large-scale, the number of points to be filtered becomes large. As a result, the filtering speed becomes the bottleneck of improving the query speed when the scale of data becomes larger and larger. Furthermore, we observe a ''Non-Uniform'' phenomenon in the most popular Euclidean LSH which can degrade the filtering speed dramatically. In this paper, a pivot-based algorithm is proposed to improve the filtering speed by using triangle inequality to prune the search process. Furthermore, a novel method to select an optimal pivot for even larger improvement is provided. The experimental results on two open large-scale datasets show that our method can significantly improve the query speed of Euclidean LSH.