Linear cross-modal hashing for efficient multimedia search

Authors:
Xiaofeng Zhu;Zi Huang;Heng Tao Shen;Xin Zhao
Affiliations:
Guangxi Normal University, Guangxi, China;The University of Queensland, Brisbane, Australia;The University of Queensland, Brisbane, Australia;The University of Queensland, Brisbane, Australia
Venue:
Proceedings of the 21st ACM international conference on Multimedia
Year:
2013

Citing 21
Cited 0

Similarity Search in High Dimensions via Hashing

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Latent dirichlet allocation

The Journal of Machine Learning Research
Think globally, fit locally: unsupervised learning of low dimensional manifolds

The Journal of Machine Learning Research
Locality-sensitive hashing scheme based on p-stable distributions

SCG '04 Proceedings of the twentieth annual symposium on Computational geometry
Semantic hashing

International Journal of Approximate Reasoning
Ranking with local regression and global alignment for cross media retrieval

MM '09 Proceedings of the 17th ACM international conference on Multimedia
NUS-WIDE: a real-world web image database from National University of Singapore

Proceedings of the ACM International Conference on Image and Video Retrieval
A new approach to cross-modal multimedia retrieval

Proceedings of the international conference on Multimedia
Fast approximate nearest neighbor methods for non-Euclidean manifolds with applications to human activity analysis in videos

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part II
Product Quantization for Nearest Neighbor Search

IEEE Transactions on Pattern Analysis and Machine Intelligence
Composite hashing with multiple information sources

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Multiple feature hashing for real-time large scale near-duplicate video retrieval

MM '11 Proceedings of the 19th ACM international conference on Multimedia
LDAHash: Improved Matching with Smaller Descriptors

IEEE Transactions on Pattern Analysis and Machine Intelligence
Learning hash functions for cross-view similarity search

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Two
A probabilistic model for multimodal hash function learning

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Fast search in Hamming space with multi-index hashing

CVPR '12 Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Supervised hashing with kernels

CVPR '12 Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Domain adaptation for object recognition: An unsupervised approach

ICCV '11 Proceedings of the 2011 International Conference on Computer Vision
Sparse hashing for fast multimedia search

ACM Transactions on Information Systems (TOIS)
Inter-media hashing for large-scale retrieval from heterogeneous data sources

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Semi-Supervised Nonlinear Hashing Using Bootstrap Sequential Projection Learning

IEEE Transactions on Knowledge and Data Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Most existing cross-modal hashing methods suffer from the scalability issue in the training phase. In this paper, we propose a novel cross-modal hashing approach with a linear time complexity to the training data size, to enable scalable indexing for multimedia search across multiple modals. Taking both the intra-similarity in each modal and the inter-similarity across different modals into consideration, the proposed approach aims at effectively learning hash functions from large-scale training datasets. More specifically, for each modal, we first partition the training data into $k$ clusters and then represent each training data point with its distances to $k$ centroids of the clusters. Interestingly, such a k-dimensional data representation can reduce the time complexity of the training phase from traditional O(n2) or higher to O(n), where $n$ is the training data size, leading to practical learning on large-scale datasets. We further prove that this new representation preserves the intra-similarity in each modal. To preserve the inter-similarity among data points across different modals, we transform the derived data representations into a common binary subspace in which binary codes from all the modals are "consistent" and comparable. nThe transformation simultaneously outputs the hash functions for all modals, which are used to convert unseen data into binary codes. Given a query of one modal, it is first mapped into the binary codes using the modal's hash functions, followed by matching the database binary codes of any other modals. Experimental results on two benchmark datasets confirm the scalability and the effectiveness of the proposed approach in comparison with the state of the art.