Semi-supervised spectral hashing for fast similarity search

Authors:
Chengwei Yao;Jiajun Bu;Chenxia Wu;Gencai Chen
Affiliations:
College of Computer Science and Technology, Zhejiang University, No. 38, Zheda Road, Hangzhou, Zhejiang 310027, China;College of Computer Science and Technology, Zhejiang University, No. 38, Zheda Road, Hangzhou, Zhejiang 310027, China;College of Computer Science and Technology, Zhejiang University, No. 38, Zheda Road, Hangzhou, Zhejiang 310027, China;College of Computer Science and Technology, Zhejiang University, No. 38, Zheda Road, Hangzhou, Zhejiang 310027, China
Venue:
Neurocomputing
Year:
2013

Citing 25
Cited 1

Using linear algebra for intelligent information retrieval

SIAM Review
Approximate nearest neighbors: towards removing the curse of dimensionality

STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
An optimal algorithm for approximate nearest neighbor searching fixed dimensions

Journal of the ACM (JACM)
An Algorithm for Finding Best Matches in Logarithmic Expected Time

ACM Transactions on Mathematical Software (TOMS)
A technique for counting ones in a binary computer

Communications of the ACM
Similarity estimation techniques from rounding algorithms

STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Machine Learning

Machine Learning
M-tree: An Efficient Access Method for Similarity Search in Metric Spaces

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Similarity Search in High Dimensions via Hashing

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Object Recognition from Local Scale-Invariant Features

ICCV '99 Proceedings of the International Conference on Computer Vision-Volume 2 - Volume 2
Locality-sensitive hashing scheme based on p-stable distributions

SCG '04 Proceedings of the twentieth annual symposium on Computational geometry
Content-based multimedia information retrieval: State of the art and challenges

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
Cover trees for nearest neighbor

ICML '06 Proceedings of the 23rd international conference on Machine learning
User performance versus precision measures for simple search tasks

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Finding near-duplicate web pages: a large-scale evaluation of algorithms

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Principles of hash-based text retrieval

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Strategies for retrieving plagiarized documents

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Factorization meets the neighborhood: a multifaceted collaborative filtering model

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
80 Million Tiny Images: A Large Data Set for Nonparametric Object and Scene Recognition

IEEE Transactions on Pattern Analysis and Machine Intelligence
Learning to hash: forgiving hash functions and applications

Data Mining and Knowledge Discovery
Nearest-neighbor caching for content-match applications

Proceedings of the 18th international conference on World wide web
Semantic hashing

International Journal of Approximate Reasoning
Self-taught hashing for fast similarity search

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Scalable similarity search with optimized kernel hashing

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Shape google: Geometric words and expressions for invariant shape retrieval

ACM Transactions on Graphics (TOG)

Large-scale image retrieval based on boosting iterative quantization hashing with query-adaptive reranking

Neurocomputing

Quantified Score

Hi-index	0.01

Visualization

Abstract

Fast similarity search has been a key step in many large-scale computer vision and information retrieval tasks. Recently, there are a surge of research interests on the hashing-based techniques to allow approximate but highly efficient similarity search. Most existing hashing methods are unsupervised, which demonstrate the promising performance using the information of unlabeled data to generate binary codes. In this paper, we propose a novel semi-supervised hashing method to take into account the pairwise supervised information including must-link and cannot-link, and then maximize the information provided by each bit according to both the labeled data and the unlabeled data. Different from previous works on semi-supervised hashing, we use the square of the Euclidean distance to measure the Hamming distance, which leads to a more general Laplacian matrix based solution after the relaxation by removing the binary constraints. We also relax the orthogonality constraints to reduce the error when converting the real-value solution to the binary one. The experimental evaluations on three benchmark datasets show the superior performance of the proposed method over the state-of-the-art approaches.