Cross-media semantic representation via bi-directional learning to rank

Authors:
Fei Wu;Xinyan Lu;Zhongfei Zhang;Shuicheng Yan;Yong Rui;Yueting Zhuang
Affiliations:
Zhejiang University, Hangzhou, China;Zhejiang University, Hangzhou, China;Zhejiang University, Hangzhou, China;National University of Singapore, Singapore, Singapore;Microsoft Research Asia, Beijing, China;Zhejiang University, Hangzhou, China
Venue:
Proceedings of the 21st ACM international conference on Multimedia
Year:
2013

Citing 24
Cited 0

Probabilistic latent semantic indexing

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Modern Information Retrieval

Modern Information Retrieval
Optimizing search engines using clickthrough data

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Latent dirichlet allocation

The Journal of Machine Learning Research
Matching words and pictures

The Journal of Machine Learning Research
Distinctive Image Features from Scale-Invariant Keypoints

International Journal of Computer Vision
Image Categorization by Learning and Reasoning with Regions

The Journal of Machine Learning Research
Large Margin Methods for Structured and Interdependent Output Variables

The Journal of Machine Learning Research
A support vector method for multivariate performance measures

ICML '05 Proceedings of the 22nd international conference on Machine learning
Canonical Correlation Analysis: An Overview with Application to Learning Methods

Neural Computation
Pegasos: Primal Estimated sub-GrAdient SOlver for SVM

Proceedings of the 24th international conference on Machine learning
A support vector method for optimizing average precision

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
A Discriminative Kernel-Based Approach to Rank Images from Text Queries

IEEE Transactions on Pattern Analysis and Machine Intelligence
Structured learning for non-smooth ranking losses

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Query by document

Proceedings of the Second ACM International Conference on Web Search and Data Mining
Smoothing clickthrough data for web search ranking

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Cutting-plane training of structural SVMs

Machine Learning
Learning to rank with (a lot of) word features

Information Retrieval
A new approach to cross-modal multimedia retrieval

Proceedings of the international conference on Multimedia
Mining Semantic Correlation of Heterogeneous Multimedia Data for Cross-Media Retrieval

IEEE Transactions on Multimedia
Learning Multimodal Dictionaries

IEEE Transactions on Image Processing
Generalized Multiview Analysis: A discriminative latent space

CVPR '12 Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Learning cross-modality similarity for multinomial data

ICCV '11 Proceedings of the 2011 International Conference on Computer Vision
A low rank structural large margin method for cross-modal ranking

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

In multimedia information retrieval, most classic approaches tend to represent different modalities of media in the same feature space. Existing approaches take either one-to-one paired data or uni-directional ranking examples (i.e., utilizing only text-query-image ranking examples or image-query-text ranking examples) as training examples, which do not make full use of bi-directional ranking examples (bi-directional ranking means that both text-query-image and image-query-text ranking examples are utilized in the training period) to achieve a better performance. In this paper, we consider learning a cross-media representation model from the perspective of optimizing a listwise ranking problem while taking advantage of bi-directional ranking examples. We propose a general cross-media ranking algorithm to optimize the bi-directional listwise ranking loss with a latent space embedding, which we call Bi-directional Cross-Media Semantic Representation Model (Bi-CMSRM). The latent space embedding is discriminatively learned by the structural large margin learning for optimization with certain ranking criteria (mean average precision in this paper) directly. We evaluate Bi-CMSRM on the Wikipedia and NUS-WIDE datasets and show that the utilization of the bi-directional ranking examples achieves a much better performance than only using the uni-directional ranking examples.