Semi-Supervised Hashing for Large-Scale Search

Authors:
Jun Wang;Sanjiv Kumar;Shih-Fu Chang
Affiliations:
IBM T.J. Watson Research, Yorktown Heights;Google Research, New York;Columbia University, New York
Venue:
IEEE Transactions on Pattern Analysis and Machine Intelligence
Year:
2012

Citing 0
Cited 14

Hashing with cauchy graph

PCM'12 Proceedings of the 13th Pacific-Rim conference on Advances in Multimedia Information Processing
A query by humming system based on locality sensitive hashing indexes

Signal Processing
Semantic hashing using tags and topic modeling

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Neighbourhood preserving quantisation for LSH

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Comparing apples to oranges: a scalable solution with heterogeneous hashing

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Image search—from thousands to billions in 20 years

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP) - Special Sections on the 20th Anniversary of ACM International Conference on Multimedia, Best Papers of ACM Multimedia 2012
Topology preserving hashing for similarity search

Proceedings of the 21st ACM international conference on Multimedia
Improved binary feature matching through fusion of hamming distance and fragile bit weight

Proceedings of the 3rd ACM international workshop on Interactive multimedia on mobile & portable devices
Weighted hashing for fast large scale similarity search

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
A unified approximate nearest neighbor search scheme by combining data structure and hashing

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Smart hashing update for fast response

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Parametric local multimodal hashing for cross-view similarity search

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Mixed image-keyword query adaptive hashing over multilabel images

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
Beyond cross-domain learning: Multiple-domain nonnegative matrix factorization

Engineering Applications of Artificial Intelligence

Quantified Score

Hi-index	0.14

Visualization

Abstract

Hashing-based approximate nearest neighbor (ANN) search in huge databases has become popular due to its computational and memory efficiency. The popular hashing methods, e.g., Locality Sensitive Hashing and Spectral Hashing, construct hash functions based on random or principal projections. The resulting hashes are either not very accurate or are inefficient. Moreover, these methods are designed for a given metric similarity. On the contrary, semantic similarity is usually given in terms of pairwise labels of samples. There exist supervised hashing methods that can handle such semantic similarity, but they are prone to overfitting when labeled data are small or noisy. In this work, we propose a semi-supervised hashing (SSH) framework that minimizes empirical error over the labeled set and an information theoretic regularizer over both labeled and unlabeled sets. Based on this framework, we present three different semi-supervised hashing methods, including orthogonal hashing, nonorthogonal hashing, and sequential hashing. Particularly, the sequential hashing method generates robust codes in which each hash function is designed to correct the errors made by the previous ones. We further show that the sequential learning paradigm can be extended to unsupervised domains where no labeled pairs are available. Extensive experiments on four large datasets (up to 80 million samples) demonstrate the superior performance of the proposed SSH methods over state-of-the-art supervised and unsupervised hashing techniques.