An integrated statistical model for multimedia evidence combination

  • Authors:
  • Sheng Gao;Joo-Hwee Lim;Qibin Sun

  • Affiliations:
  • Institute for Infocomm Research, Singapore, Singapore;Institute for Infocomm Research, Singapore, Singapore;Institute for Infocomm Research, Singapore, Singapore

  • Venue:
  • Proceedings of the 15th international conference on Multimedia
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Given the rich content-based features of multimedia (e.g., visual, text, or audio) and the development of various approaches to automatic detectors (e.g., SVM, Adaboost, HMM or GMM, etc), can we find an efficient approach to combine these evidences? In the paper, we address this issue by proposing an Integrated Statistical Model (ISM) to combine diverse evidences extracted from the domain knowledge of detectors, the intrinsic structure of modality distribution and inter-concept associations. The ISM provides a unified framework for evidence fusion, having the following unique advantages: 1) the intrinsic modes in the modality distribution are discovered and modeled by a generative model; 2) each mode is a partial description of structure of the modality and the mode configuration, i.e. a set of modes, and is a new representation of the document content; 3) mode discrimination is automatically learned; 4) prior knowledge such as detector correlations and inter-concept relations can be explicitly described and integrated. More importantly, an efficient pseudo-EM algorithm is realized for training the statistical model. The learning algorithm relaxes the computational cost due to the normalized factor and latent variables in the graphical model. We evaluate system performance of our multimedia semantic concept detection with the TRECVID 2005 development dataset, in terms of efficiency and capacity. Our experimental results demonstrate that the ISM fusion outperforms the SVM based discriminative fusion method.