Multi-probe LSH: efficient indexing for high-dimensional similarity search

Authors:
Qin Lv;William Josephson;Zhe Wang;Moses Charikar;Kai Li
Affiliations:
Princeton University, Princeton, NJ;Princeton University, Princeton, NJ;Princeton University, Princeton, NJ;Princeton University, Princeton, NJ;Princeton University, Princeton, NJ
Venue:
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Year:
2007

Citing 19
Cited 68

K-d trees for semidynamic point sets

SCG '90 Proceedings of the sixth annual symposium on Computational geometry
Point location in arrangements of hyperplanes

Information and Computation
The SR-tree: an index structure for high-dimensional nearest neighbor queries

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Approximate nearest neighbors: towards removing the curse of dimensionality

STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Searching in metric spaces

ACM Computing Surveys (CSUR)
R-trees: a dynamic index structure for spatial searching

SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Similarity Search in High Dimensions via Hashing

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
The A-tree: An Index Structure for High-Dimensional Spaces Using Relative Approximation

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Stable distributions, pseudorandom generators, embeddings and data stream computation

FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
Index-driven similarity search in metric spaces (Survey Article)

ACM Transactions on Database Systems (TODS)
Navigating nets: simple algorithms for proximity search

SODA '04 Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms
Locality-sensitive hashing scheme based on p-stable distributions

SCG '04 Proceedings of the twentieth annual symposium on Computational geometry
The Active Vertice method: a performant filtering approach to high-dimensional indexing

Data & Knowledge Engineering
LSH forest: self-tuning indexes for similarity search

WWW '05 Proceedings of the 14th international conference on World Wide Web
Entropy based nearest neighbor search in high dimensions

SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Lower bounds on locality sensitive hashing

Proceedings of the twenty-second annual symposium on Computational geometry
Cover trees for nearest neighbor

ICML '06 Proceedings of the 23rd international conference on Machine learning
Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions

FOCS '06 Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science

SpotSigs: robust and efficient near duplicate detection in large web collections

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Modeling LSH for performance tuning

Proceedings of the 17th ACM conference on Information and knowledge management
A posteriori multi-probe locality sensitive hashing

MM '08 Proceedings of the 16th ACM international conference on Multimedia
Distributed similarity search in high dimensions using locality sensitive hashing

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Dimension-Specific Search for Multimedia Retrieval

DASFAA '09 Proceedings of the 14th International Conference on Database Systems for Advanced Applications
Quality and efficiency in high dimensional nearest neighbor search

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Searching High-Dimensional Neighbours: CPU-Based Tailored Data-Structures Versus GPU-Based Brute-Force Method

MIRAGE '09 Proceedings of the 4th International Conference on Computer Vision/Computer Graphics CollaborationTechniques
MLR-Index: An Index Structure for Fast and Scalable Similarity Search in High Dimensions

SSDBM 2009 Proceedings of the 21st International Conference on Scientific and Statistical Database Management
Query expansion for hash-based image object retrieval

MM '09 Proceedings of the 17th ACM international conference on Multimedia
Local summarization and multi-level LSH for retrieving multi-variant audio tracks

MM '09 Proceedings of the 17th ACM international conference on Multimedia
Interactive objects retrieval with efficient boosting

MM '09 Proceedings of the 17th ACM international conference on Multimedia
HARRA: fast iterative hashed record linkage for large-scale data collections

Proceedings of the 13th International Conference on Extending Database Technology
Learning Approximate Sequential Patterns for Classification

The Journal of Machine Learning Research
An improved OLAP join and aggregate algorithm based on dimension hierarchy

FSKD'09 Proceedings of the 6th international conference on Fuzzy systems and knowledge discovery - Volume 5
Efficient and accurate nearest neighbor and closest pair search in high-dimensional space

ACM Transactions on Database Systems (TODS)
Similarity search and locality sensitive hashing using ternary content addressable memories

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Locality sensitive hashing: A comparison of hash function types and querying mechanisms

Pattern Recognition Letters
Scalable clip-based near-duplicate video detection with ordinal measure

Proceedings of the ACM International Conference on Image and Video Retrieval
On locality-sensitive indexing in generic metric spaces

Proceedings of the Third International Conference on SImilarity Search and APplications
Combining multi-probe histogram and order-statistics based LSH for scalable audio content retrieval

Proceedings of the international conference on Multimedia
Data-oriented locality sensitive hashing

Proceedings of the international conference on Multimedia
Efficient incremental near duplicate detection based on locality sensitive hashing

DEXA'10 Proceedings of the 21st international conference on Database and expert systems applications: Part I
Subspace tree: high dimensional multimedia indexing with logarithmic temporal complexity

Journal of Intelligent Information Systems
NET-FLi: on-the-fly compression, archiving and indexing of streaming network traffic

Proceedings of the VLDB Endowment
Randomly projected KD-trees with distance metric learning for image retrieval

MMM'11 Proceedings of the 17th international conference on Advances in multimedia modeling - Volume Part II
Efficient k-nearest neighbor graph construction for generic similarity measures

Proceedings of the 20th international conference on World wide web
Text localization and recognition in complex scenes using local features

ACCV'10 Proceedings of the 10th Asian conference on Computer vision - Volume Part III
Effective data co-reduction for multimedia similarity search

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Large scale visual-based event matching

Proceedings of the 1st ACM International Conference on Multimedia Retrieval
Stabilizing the recall in similarity search

Proceedings of the Fourth International Conference on SImilarity Search and APplications
SALSAS: Sub-linear active learning strategy with approximate k-NN search

Pattern Recognition
Fast locality-sensitive hashing

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Fast GPU-based locality sensitive hashing for k-nearest neighbor computation

Proceedings of the 19th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems
Sparse spectral hashing

Pattern Recognition Letters
Distributed similarity estimation using derived dimensions

The VLDB Journal — The International Journal on Very Large Data Bases
iDISQUE: tuning high-dimensional similarity queries in DHT networks

DASFAA'10 Proceedings of the 15th international conference on Database Systems for Advanced Applications - Volume Part I
Bayesian locality sensitive hashing for fast similarity search

Proceedings of the VLDB Endowment
Proximity-Based order-respecting intersection for searching in image databases

AMR'10 Proceedings of the 8th international conference on Adaptive Multimedia Retrieval: context, exploration, and fusion
Is simhash achilles?

AIRS'11 Proceedings of the 7th Asia conference on Information Retrieval Technology
Locality-sensitive hashing scheme based on dynamic collision counting

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
SIMP: accurate and efficient near neighbor search in high dimensional spaces

Proceedings of the 15th International Conference on Extending Database Technology
Real-time creation of bitmap indexes on streaming network data

The VLDB Journal — The International Journal on Very Large Data Bases
High-confidence near-duplicate image detection

Proceedings of the 2nd ACM International Conference on Multimedia Retrieval
Large-scale similarity data management with distributed Metric Index

Information Processing and Management: an International Journal
Use of permutation prefixes for efficient and scalable approximate similarity search

Information Processing and Management: an International Journal
A template library to integrate thread scheduling and locality management for NUMA multiprocessors

HotPar'12 Proceedings of the 4th USENIX conference on Hot Topics in Parallelism
Boosting multi-kernel locality-sensitive hashing for scalable image retrieval

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Towards enabling outlier detection in large, high dimensional data warehouses

SSDBM'12 Proceedings of the 24th international conference on Scientific and Statistical Database Management
Efficient distributed locality sensitive hashing

Proceedings of the 21st ACM international conference on Information and knowledge management
Utilizing memory content similarity for improving the performance of highly available virtual machines

Future Generation Computer Systems
On Combining Sequence Alignment and Feature-Quantization for Sub-Image Searching

International Journal of Multimedia Data Engineering & Management
Nonnegative sparse coding induced hashing for image copy detection

Neurocomputing
Indexing and searching 100M images with map-reduce

Proceedings of the 3rd ACM conference on International conference on multimedia retrieval
An improved method of locality sensitive hashing for indexing large-scale and high-dimensional features

Signal Processing
Least square regularized spectral hashing for similarity search

Signal Processing
Fast image copy detection approach based on local fingerprint defined visual words

Signal Processing
Inter-media hashing for large-scale retrieval from heterogeneous data sources

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Local features classification for adaptive tracking

MICAI'12 Proceedings of the 11th Mexican international conference on Advances in Artificial Intelligence - Volume Part I
Effective hashing for large-scale multimedia search

Proceedings of the 2013 Sigmod/PODS Ph.D. symposium on PhD symposium
Neighbourhood preserving quantisation for LSH

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Indexed block coordinate descent for large-scale linear classification with limited memory

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Listen, look, and gotcha: instant video search with mobile phones by layered audio-video indexing

Proceedings of the 21st ACM international conference on Multimedia
Locality sensitive hashing revisited: filling the gap between theory and algorithm analysis

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Asymmetric signature schemes for efficient exact edit similarity query processing

ACM Transactions on Database Systems (TODS)
Parametric local multimodal hashing for cross-view similarity search

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Local features and histogram based planar object recognition

Proceedings of the 8th International Conference on Ubiquitous Information Management and Communication
A gossip-based approach for Internet-scale cardinality estimation of XPath queries over distributed semistructured data

The VLDB Journal — The International Journal on Very Large Data Bases
Efficient binary code indexing with pivot based locality sensitive clustering

Multimedia Tools and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Similarity indices for high-dimensional data are very desirable for building content-based search systems for feature-rich data such as audio, images, videos, and other sensor data. Recently, locality sensitive hashing (LSH) and its variations have been proposed as indexing techniques for approximate similarity search. A significant drawback of these approaches is the requirement for a large number of hash tables in order to achieve good search quality. This paper proposes a new indexing scheme called multi-probe LSH that overcomes this drawback. Multi-probe LSH is built on the well-known LSH technique, but it intelligently probes multiple buckets that are likely to contain query results in a hash table. Our method is inspired by and improves upon recent theoretical work on entropy-based LSH designed to reduce the space requirement of the basic LSH method. We have implemented the multi-probe LSH method and evaluated the implementation with two different high-dimensional datasets. Our evaluation shows that the multi-probe LSH method substantially improves upon previously proposed methods in both space and time efficiency. To achieve the same search quality, multi-probe LSH has a similar time-efficiency as the basic LSH method while reducing the number of hash tables by an order of magnitude. In comparison with the entropy-based LSH method, to achieve the same search quality, multi-probe LSH uses less query time and 5 to 8 times fewer number of hash tables.