A maximum entropy approach to natural language processing
Computational Linguistics
Analyses of multiple evidence combination
Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
Modern Information Retrieval
Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms
Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms
Discriminative model fusion for semantic concept detection and annotation in video
MULTIMEDIA '03 Proceedings of the eleventh ACM international conference on Multimedia
The combination limit in multimedia retrieval
MULTIMEDIA '03 Proceedings of the eleventh ACM international conference on Multimedia
Optimal multimodal fusion for multimedia data analysis
Proceedings of the 12th annual ACM international conference on Multimedia
Multimedia semantic indexing using model vectors
ICME '03 Proceedings of the 2003 International Conference on Multimedia and Expo - Volume 1
Evaluation campaigns and TRECVid
MIR '06 Proceedings of the 8th ACM international workshop on Multimedia information retrieval
A comparison of score, rank and probability-based fusion methods for video shot retrieval
CIVR'05 Proceedings of the 4th international conference on Image and Video Retrieval
IEEE Transactions on Multimedia - Special issue on integration of context and content
Hi-index | 0.00 |
Given the rich content-based features of multimedia (e.g., visual, text, or audio) and the development of various approaches to automatic detectors (e.g., SVM, Adaboost, HMM or GMM, etc), can we find an efficient approach to combine these evidences? In the paper, we address this issue by proposing an Integrated Statistical Model (ISM) to combine diverse evidences extracted from the domain knowledge of detectors, the intrinsic structure of modality distribution and inter-concept associations. The ISM provides a unified framework for evidence fusion, having the following unique advantages: 1) the intrinsic modes in the modality distribution are discovered and modeled by a generative model; 2) each mode is a partial description of structure of the modality and the mode configuration, i.e. a set of modes, and is a new representation of the document content; 3) mode discrimination is automatically learned; 4) prior knowledge such as detector correlations and inter-concept relations can be explicitly described and integrated. More importantly, an efficient pseudo-EM algorithm is realized for training the statistical model. The learning algorithm relaxes the computational cost due to the normalized factor and latent variables in the graphical model. We evaluate system performance of our multimedia semantic concept detection with the TRECVID 2005 development dataset, in terms of efficiency and capacity. Our experimental results demonstrate that the ISM fusion outperforms the SVM based discriminative fusion method.