Active hashing and its application to image and text retrieval

Authors:
Yi Zhen;Dit-Yan Yeung
Affiliations:
Department of Computer Science and Engineering, The Hong Kong University of Science and Technology, Kowloon, China;Department of Computer Science and Engineering, The Hong Kong University of Science and Technology, Kowloon, China
Venue:
Data Mining and Knowledge Discovery
Year:
2013

Citing 32
Cited 2

Term-weighting approaches in automatic text retrieval

Information Processing and Management: an International Journal
Query by committee

COLT '92 Proceedings of the fifth annual workshop on Computational learning theory
Information-based objective functions for active data selection

Neural Computation
A sequential algorithm for training text classifiers

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Improving Generalization with Active Learning

Machine Learning - Special issue on structured connectionist systems
An optimal algorithm for approximate nearest neighbor searching fixed dimensions

Journal of the ACM (JACM)
Data structures and algorithms for nearest neighbor search in general metric spaces

SODA '93 Proceedings of the fourth annual ACM-SIAM Symposium on Discrete algorithms
An Algorithm for Finding Best Matches in Logarithmic Expected Time

ACM Transactions on Mathematical Software (TOMS)
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Queries and Concept Learning

Machine Learning
Queries and Concept Learning

Machine Learning
Toward Optimal Active Learning through Sampling Estimation of Error Reduction

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Employing EM and Pool-Based Active Learning for Text Classification

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Support vector machine active learning with applications to text classification

The Journal of Machine Learning Research
An efficient boosting algorithm for combining preferences

The Journal of Machine Learning Research
Convex Optimization

Convex Optimization
Active learning using pre-clustering

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Large-scale text categorization by batch mode active learning

Proceedings of the 15th international conference on World Wide Web
Batch mode active learning and its application to medical image classification

ICML '06 Proceedings of the 23rd international conference on Machine learning
Active learning via transductive experimental design

ICML '06 Proceedings of the 23rd international conference on Machine learning
Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions

FOCS '06 Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science
Learning task-specific similarity

Learning task-specific similarity
Nearest-Neighbor Methods in Learning and Vision: Theory and Practice (Neural Information Processing)

Nearest-Neighbor Methods in Learning and Vision: Theory and Practice (Neural Information Processing)
Laplacian optimal design for image retrieval

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
trNon-greedy active learning for text categorization using convex ansductive experimental design

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Locality sensitive hash functions based on concomitant rank order statistics

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Factorization meets the neighborhood: a multifaceted collaborative filtering model

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Semantic hashing

International Journal of Approximate Reasoning
Optimistic active learning using mutual information

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Self-taught hashing for fast similarity search

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
SED: supervised experimental design and its application to text classification

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Scalable similarity search with optimized kernel hashing

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining

A probabilistic model for multimodal hash function learning

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Smart hashing update for fast response

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

In recent years, hashing-based methods for large-scale similarity search have sparked considerable research interests in the data mining and machine learning communities. While unsupervised hashing-based methods have achieved promising successes for metric similarity, they cannot handle semantic similarity which is usually given in the form of labeled point pairs. To overcome this limitation, some attempts have recently been made on semi-supervised hashing which aims at learning hash functions from both metric and semantic similarity simultaneously. Existing semi-supervised hashing methods can be regarded as passive hashing since they assume that the labeled pairs are provided in advance. In this paper, we propose a novel framework, called active hashing, which can actively select the most informative labeled pairs for hash function learning. Specifically, it identifies the most informative points to label and constructs labeled pairs accordingly. Under this framework, we use data uncertainty as a measure of informativeness and develop a batch mode algorithm to speed up active selection. We empirically compare our method with a state-of-the-art passive hashing method on two benchmark data sets, showing that the proposed method can reduce labeling cost as well as overcome the limitations of passive hashing.