Scalable similarity search with optimized kernel hashing

Authors:
Junfeng He;Wei Liu;Shih-Fu Chang
Affiliations:
Columbia University, New York, NY, USA;Columbia University, New York, NY, USA;Columbia University, New York, NY, USA
Venue:
Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2010

Citing 12
Cited 17

The nature of statistical learning theory

The nature of statistical learning theory
Approximate nearest neighbors: towards removing the curse of dimensionality

STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
An Algorithm for Finding Best Matches in Logarithmic Expected Time

ACM Transactions on Mathematical Software (TOMS)
Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope

International Journal of Computer Vision
Similarity Search in High Dimensions via Hashing

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Distinctive Image Features from Scale-Invariant Keypoints

International Journal of Computer Vision
Locality-sensitive hashing scheme based on p-stable distributions

SCG '04 Proceedings of the twentieth annual symposium on Computational geometry
Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories

CVPRW '04 Proceedings of the 2004 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'04) Volume 12 - Volume 12
Photo tourism: exploring photo collections in 3D

ACM SIGGRAPH 2006 Papers
Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Comparison of Descriptor Spaces for Chemical Compound Retrieval and Classification

ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Locality sensitive hash functions based on concomitant rank order statistics

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining

Learning reconfigurable hashing for diverse semantics

Proceedings of the 1st ACM International Conference on Multimedia Retrieval
Fast locality-sensitive hashing

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Compact hashing for mixed image-keyword query over multi-label images

Proceedings of the 2nd ACM International Conference on Multimedia Retrieval
Learning binary codes for collaborative filtering

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
A probabilistic model for multimodal hash function learning

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Manhattan hashing for large-scale image retrieval

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Boosting multi-kernel locality-sensitive hashing for scalable image retrieval

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Query-driven iterated neighborhood graph search for large scale indexing

Proceedings of the 20th ACM international conference on Multimedia
Submodular video hashing: a unified framework towards video pooling and indexing

Proceedings of the 20th ACM international conference on Multimedia
Compact kernel hashing with multiple features

Proceedings of the 20th ACM international conference on Multimedia
Semi-supervised spectral hashing for fast similarity search

Neurocomputing
Active hashing and its application to image and text retrieval

Data Mining and Knowledge Discovery
Least square regularized spectral hashing for similarity search

Signal Processing
Optimal hashing schemes for entity matching

Proceedings of the 22nd international conference on World Wide Web
Order preserving hashing for approximate nearest neighbor search

Proceedings of the 21st ACM international conference on Multimedia
Mixed image-keyword query adaptive hashing over multilabel images

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
Multiple feature kernel hashing for large-scale visual search

Pattern Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

Scalable similarity search is the core of many large scale learning or data mining applications. Recently, many research results demonstrate that one promising approach is creating compact and efficient hash codes that preserve data similarity. By efficient, we refer to the low correlation (and thus low redundancy) among generated codes. However, most existing hash methods are designed only for vector data. In this paper, we develop a new hashing algorithm to create efficient codes for large scale data of general formats with any kernel function, including kernels on vectors, graphs, sequences, sets and so on. Starting with the idea analogous to spectral hashing, novel formulations and solutions are proposed such that a kernel based hash function can be explicitly represented and optimized, and directly applied to compute compact hash codes for new samples of general formats. Moreover, we incorporate efficient techniques, such as Nystrom approximation, to further reduce time and space complexity for indexing and search, making our algorithm scalable to huge data sets. Another important advantage of our method is the ability to handle diverse types of similarities according to actual task requirements, including both feature similarities and semantic similarities like label consistency. We evaluate our method using both vector and non-vector data sets at a large scale up to 1 million samples. Our comprehensive results show the proposed method outperforms several state-of-the-art approaches for all the tasks, with a significant gain for most tasks.