An improved method of locality sensitive hashing for indexing large-scale and high-dimensional features

Authors:
Xiaoguang Gu;Yongdong Zhang;Lei Zhang;Dongming Zhang;Jintao Li
Affiliations:
Institute of Computing Technology, Chinese Academy of Sciences, No. 6 Kexueyuan South Road, Haidian District, 100190 Beijing, China and Graduate University of Chinese Academy of Sciences, 19A Yuqu ...;Institute of Computing Technology, Chinese Academy of Sciences, No. 6 Kexueyuan South Road, Haidian District, 100190 Beijing, China and Beijing Key Laboratory of Mobile Computing and Pervasive Dev ...;Institute of Computing Technology, Chinese Academy of Sciences, No. 6 Kexueyuan South Road, Haidian District, 100190 Beijing, China and Graduate University of Chinese Academy of Sciences, 19A Yuqu ...;Institute of Computing Technology, Chinese Academy of Sciences, No. 6 Kexueyuan South Road, Haidian District, 100190 Beijing, China and Beijing Key Laboratory of Mobile Computing and Pervasive Dev ...;Institute of Computing Technology, Chinese Academy of Sciences, No. 6 Kexueyuan South Road, Haidian District, 100190 Beijing, China and Beijing Key Laboratory of Mobile Computing and Pervasive Dev ...
Venue:
Signal Processing
Year:
2013

Citing 29
Cited 1

Distance-based indexing for high-dimensional metric spaces

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Data structures and algorithms for nearest neighbor search in general metric spaces

SODA '93 Proceedings of the fourth annual ACM-SIAM Symposium on Discrete algorithms
Some approaches to best-match file searching

Communications of the ACM
Searching in metric spaces

ACM Computing Surveys (CSUR)
Fixed Queries Array: A Fast and Economical Data Structure for Proximity Searching

Multimedia Tools and Applications
M-tree: An Efficient Access Method for Similarity Search in Metric Spaces

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Similarity Search in High Dimensions via Hashing

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Content-Based Image Indexing

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Near Neighbor Search in Large Metric Spaces

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Proximity Matching Using Fixed-Queries Trees

CPM '94 Proceedings of the 5th Annual Symposium on Combinatorial Pattern Matching
Stable distributions, pseudorandom generators, embeddings and data stream computation

FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
Locality-sensitive hashing scheme based on p-stable distributions

SCG '04 Proceedings of the twentieth annual symposium on Computational geometry
An efficient parts-based near-duplicate and sub-image retrieval system

Proceedings of the 12th annual ACM international conference on Multimedia
LSH forest: self-tuning indexes for similarity search

WWW '05 Proceedings of the 14th international conference on World Wide Web
Entropy based nearest neighbor search in high dimensions

SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Rapid Object Indexing Using Locality Sensitive Hashing and Joint 3D-Signature Space Estimation

IEEE Transactions on Pattern Analysis and Machine Intelligence
Nearest-Neighbor Methods in Learning and Vision: Theory and Practice (Neural Information Processing)

Nearest-Neighbor Methods in Learning and Vision: Theory and Practice (Neural Information Processing)
A Data Structure and an Algorithm for the Nearest Point Problem

IEEE Transactions on Software Engineering
Multi-probe LSH: efficient indexing for high-dimensional similarity search

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions

Communications of the ACM - 50th anniversary issue: 1958 - 2008
A posteriori multi-probe locality sensitive hashing

MM '08 Proceedings of the 16th ACM international conference on Multimedia
Semi-supervised kernel density estimation for video annotation

Computer Vision and Image Understanding
Unified video annotation via multigraph learning

IEEE Transactions on Circuits and Systems for Video Technology
NUS-WIDE: a real-world web image database from National University of Singapore

Proceedings of the ACM International Conference on Image and Video Retrieval
Beyond distance measurement: constructing neighborhood similarity for video annotation

IEEE Transactions on Multimedia - Special section on communities and media computing
Dynamic captioning: video accessibility enhancement for hearing impairment

Proceedings of the international conference on Multimedia
Product Quantization for Nearest Neighbor Search

IEEE Transactions on Pattern Analysis and Machine Intelligence
Locality-Sensitive Hashing for Chi2 Distance

IEEE Transactions on Pattern Analysis and Machine Intelligence
Towards a Relevant and Diverse Search of Social Images

IEEE Transactions on Multimedia

New Media Cloud Computing: Opportunities and Challenges

International Journal of Cloud Applications and Computing

Quantified Score

Hi-index	0.08

Visualization

Abstract

In recent years, Locality sensitive hashing (LSH) has been popularly used as an effective and efficient index structure of multimedia signals. LSH is originally proposed for resolving the high-dimensional approximate similarity search problem. Until now, many kinds of variations of LSH have been proposed for large-scale indexing. Much of the interest is focused on improving the query accuracy for skewed data distribution and reducing the storage space. However, when using LSH, a final filtering process based on exact similarity measure is needed. When the dataset is large-scale, the number of points to be filtered becomes large. As a result, the filtering speed becomes the bottleneck of improving the query speed when the scale of data becomes larger and larger. Furthermore, we observe a ''Non-Uniform'' phenomenon in the most popular Euclidean LSH which can degrade the filtering speed dramatically. In this paper, a pivot-based algorithm is proposed to improve the filtering speed by using triangle inequality to prune the search process. Furthermore, a novel method to select an optimal pivot for even larger improvement is provided. The experimental results on two open large-scale datasets show that our method can significantly improve the query speed of Euclidean LSH.