Multimodal image retrieval via Bayesian information fusion

Authors:
Rui Zhang;Ling Guan
Affiliations:
Ryerson Multimedia Research Laboratory, Ryerson University, Toronto, ON, Canada;Ryerson Multimedia Research Laboratory, Ryerson University, Toronto, ON, Canada
Venue:
ICME'09 Proceedings of the 2009 IEEE international conference on Multimedia and Expo
Year:
2009

Citing 13
Cited 0

Probabilistic latent semantic indexing

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Support vector machine active learning for image retrieval

MULTIMEDIA '01 Proceedings of the ninth ACM international conference on Multimedia
Indexing and Retrieval of Audio: A Survey

Multimedia Tools and Applications
Latent dirichlet allocation

The Journal of Machine Learning Research
A Probabilistic Semantic Model for Image Annotation and Multi-Modal Image Retrieva

ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1 - Volume 01
Image retrieval: Ideas, influences, and trends of the new age

ACM Computing Surveys (CSUR)
Search strategies in multimodal image retrieval

Proceedings of the second international symposium on Information interaction in context
Multimodal photo annotation and retrieval on a mobile phone

MIR '08 Proceedings of the 1st ACM international conference on Multimedia information retrieval
A collaborative Bayesian image retrieval framework

ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
An interactive approach for CBIR using a network of radial basis functions

IEEE Transactions on Multimedia
A unified framework for image retrieval using keyword and visual features

IEEE Transactions on Image Processing
Learning a semantic space from user's relevance feedback for image retrieval

IEEE Transactions on Circuits and Systems for Video Technology
A soft relevance framework in content-based image retrieval systems

IEEE Transactions on Circuits and Systems for Video Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, a multimodal image retrieval framework integrating the information in both audio and visual domain via Bayesian decision-level fusion is proposed. In both domains, a statistical model for each semantic class is learned. Based on the Bayes' theorem, the a posteriori probability of each class given a query is calculated in the audio domain, which is propagated to the images classified into the corresponding semantic class in the visual domain. These probabilistic measures are utilized as the a priori probability in the overall framework, which is combined with the likelihood evaluated based on nearest neighbor content-based image retrieval. Through the Bayes' theorem again, the images are ranked based on their a posteriori probabilities given the audio and visual feature of a query. To further improve the system, we also propose a relevance feedback scheme in the audio domain. Experimental results demonstrate the advantage of the proposed method over the retrieval simply based on visual features.