A novel multi-modal integration and propagation model for cross-media information retrieval

Authors:
Wanxia Lin;Tong Lu;Feng Su
Affiliations:
State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China;State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China;State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China
Venue:
MMM'12 Proceedings of the 18th international conference on Advances in Multimedia Modeling
Year:
2012

Citing 16
Cited 0

Unsupervised learning by probabilistic latent semantic analysis

Machine Learning
Audio-Video Sensor Fusion with Probabilistic Graphical Models

ECCV '02 Proceedings of the 7th European Conference on Computer Vision-Part I
Modeling annotated data

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
ReCoM: reinforcement clustering of multi-type interrelated data objects

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Matching words and pictures

The Journal of Machine Learning Research
Video summarization based on user log enhanced link analysis

MULTIMEDIA '03 Proceedings of the eleventh ACM international conference on Multimedia
Distinctive Image Features from Scale-Invariant Keypoints

International Journal of Computer Vision
Multi-model similarity propagation and its application for web image retrieval

Proceedings of the 12th annual ACM international conference on Multimedia
Content-based image retrieval: approaches and trends of the new age

Proceedings of the 7th ACM SIGMM international workshop on Multimedia information retrieval
Cross-modal correlation learning for clustering on image-audio dataset

Proceedings of the 15th international conference on Multimedia
Modeling Semantic Aspects for Cross-Media Image Indexing

IEEE Transactions on Pattern Analysis and Machine Intelligence
Short-term audio-visual atoms for generic video concept classification

MM '09 Proceedings of the 17th ACM international conference on Multimedia
Multiple Bernoulli relevance models for image and video annotation

CVPR'04 Proceedings of the 2004 IEEE computer society conference on Computer vision and pattern recognition
Mining Semantic Correlation of Heterogeneous Multimedia Data for Cross-Media Retrieval

IEEE Transactions on Multimedia
Harmonizing Hierarchical Manifolds for Multimedia Document Semantics Understanding and Cross-Media Retrieval

IEEE Transactions on Multimedia
CBSA: content-based soft annotation for multimodal image retrieval using Bayes point machines

IEEE Transactions on Circuits and Systems for Video Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we present a novel Probabilistic Latent Semantic Analysis-based (PLSA-based) aspect model and turn cross-media retrieval into two parts of multi-modal integration and correlation propagation. We first use multivariate Gaussian distributions to model continuous quantity in PLSA, avoiding information loss between feature-instance versus real-world matching. Multi-modal correlations are learned in an asymmetrical manner, giving a better control of the respective influence of each modality in the latent space. Then we propose a new propagation pattern to refine multi-modal correlations by efficiently taking the complementary from multi-modalities. Experimental results demonstrate that our method is accurate and robust for cross-media information retrieval.