A semantic model for cross-modal and multi-modal retrieval

Authors:
Liang Xie;Peng Pan;Yansheng Lu
Affiliations:
Huazhong University of Science and Technology, Wuhan, China;Huazhong University of Science and Technology, Wuhan, China;Huazhong University of Science and Technology, Wuhan, China
Venue:
Proceedings of the 3rd ACM conference on International conference on multimedia retrieval
Year:
2013

Citing 12
Cited 0

Random Forests

Machine Learning
Automatic image annotation and retrieval using cross-media relevance models

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Modeling annotated data

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Latent semantic fusion model for image retrieval and annotation

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Ranking with local regression and global alignment for cross media retrieval

MM '09 Proceedings of the 17th ACM international conference on Multimedia
Multilayer pLSA for multimodal image retrieval

Proceedings of the ACM International Conference on Image and Video Retrieval
Multi modal semantic indexing for image retrieval

Proceedings of the ACM International Conference on Image and Video Retrieval
A new approach to cross-modal multimedia retrieval

Proceedings of the international conference on Multimedia
Bridging the Gap: Query by Semantic Example

IEEE Transactions on Multimedia
Mining Semantic Correlation of Heterogeneous Multimedia Data for Cross-Media Retrieval

IEEE Transactions on Multimedia
Harmonizing Hierarchical Manifolds for Multimedia Document Semantics Understanding and Cross-Media Retrieval

IEEE Transactions on Multimedia
Learning the Relative Importance of Objects from Tagged Images for Retrieval and Cross-Modal Search

International Journal of Computer Vision

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, a semantic model for cross-modal and multi-modal retrieval is studied. We assume that the semantic correlation of multimedia data from different modalities can be depicted in a probabilistic generation framework. Media data from different modalities can be generated by the same semantic concepts, and the generation process of each media data is conditional independent under the semantic concepts. The semantic generation model (SGM) for cross-modal and multi-modal analysis is proposed based on this assumption. We study two types of methods: direct method Gaussian distribution and indirect method random forest, to estimate the semantic conditional distribution of SGM. Then methods for cross-modal and multi-modal retrieval are derived from SGM. Experimental results show that SGM based methods for cross-modal retrieval improve the accuracy over the state-of-the-art cross-modal method, but don't increase the time consuming, and the SGM multimodal retrieval methods also outperform traditional methods in image retrieval. Moreover, indirect SGM based method outperforms direct SGM method in the two types of retrieval, which proves that indirect SGM can better describe the semantic distribution.