Combining multimodal and temporal contextual information for semantic video analysis

Authors:
Georgios Th. Papadopoulos;Vasileios Mezaris;Ioannis Kompatsiaris;Michael G. Strintzis
Affiliations:
Information Proc. Lab., Electrical & Computer Eng. Dep., Aristotle Univ. of Thessaloniki, Greece and Informatics and Telematics Institute, Centre for Research and Technology Hellas, Greece;Telematics Institute, Centre for Research and Technology Hellas, Greece;Telematics Institute, Centre for Research and Technology Hellas, Greece;Information Proc. Lab., Electrical & Computer Eng. Dep., Aristotle Univ. of Thessaloniki, Greece and Informatics and Telematics Institute, Centre for Research and Technology Hellas, Greece
Venue:
ICIP'09 Proceedings of the 16th IEEE international conference on Image processing
Year:
2009

Citing 7
Cited 0

A tutorial on learning with Bayesian networks

Learning in graphical models
Content-Based Image Retrieval at the End of the Early Years

IEEE Transactions on Pattern Analysis and Machine Intelligence
Automatic Genre Identification for Content-Based Video Categorization

ICPR '00 Proceedings of the International Conference on Pattern Recognition - Volume 4
Multimodal Video Indexing: A Review of the State-of-the-art

Multimedia Tools and Applications
Accumulated motion energy fields estimation and representation for semantic event detection

CIVR '08 Proceedings of the 2008 international conference on Content-based image and video retrieval
Joint scene classification and segmentation based on hidden Markov model

IEEE Transactions on Multimedia
An HMM-based framework for video semantic analysis

IEEE Transactions on Circuits and Systems for Video Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, a graphical modeling-based approach to semantic video analysis is presented for jointly realizing modality fusion and temporal context exploitation. Overall, the examined video sequence is initially segmented into shots and for every resulting shot appropriate color, motion and audio features are extracted. Then, Hidden Markov Models (HMMs) are employed for performing an initial association of each shot with the semantic classes that are of interest separately for every modality. Subsequently, an integrated Bayesian Network (BN) is introduced for simultaneously performing information fusion and temporal contextual knowledge exploitation, contrary to the usual practice of performing each task separately. The final outcome of the overall video analysis approach is the association of a semantic class with every shot. Experimental results as well as comparative evaluation from the application of the proposed approach in the domain of news broadcast video are presented.