Multi-level Fusion for Semantic Video Content Indexing and Retrieval

Authors:
Rachid Benmokhtar;Benoit Huet
Affiliations:
Département Communications Multimédias, Institut Eurécom, Sophia-Antipolis, France 06904;Département Communications Multimédias, Institut Eurécom, Sophia-Antipolis, France 06904
Venue:
Adaptive Multimedial Retrieval: Retrieval, User, and Semantics
Year:
2007

Citing 6
Cited 0

The nature of statistical learning theory

The nature of statistical learning theory
Information Fusion in Face Identification

ICPR '04 Proceedings of the Pattern Recognition, 17th International Conference on (ICPR'04) Volume 3 - Volume 03
Latent semantic analysis for an effective region-based video shot retrieval system

Proceedings of the 6th ACM SIGMM international workshop on Multimedia information retrieval
Fusing MPEG-7 visual descriptors for image classification

ICANN'05 Proceedings of the 15th international conference on Artificial neural networks: formal models and their applications - Volume Part II
Neural network combining classifier based on dempster-shafer theory for semantic indexing in video content

MMM'07 Proceedings of the 13th International conference on Multimedia Modeling - Volume Part II
Multimedia descriptions based on MPEG-7: extraction and applications

IEEE Transactions on Multimedia

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we present the results of our work on the analysis of an automatic semantic video content indexing and retrieval system based on fusing various low level visual descriptors. Global MPEG-7 features extracted from video shots, are described via IVSM signature (Image Vector Space Model) in order to have a compact description of the content. Both static and dynamic feature fusion are introduced to obtain effective signatures. Support Vector Machines (SVMs) are employed to perform classification (One classifier per feature). The task of the classifiers is to detect the video semantic content. Then, classifier outputs are fused using a neural network based on evidence theory (NNET) in order to provide a decision on the content of each shot. The experimental results are conducted in the framework of the TRECVid feature extraction task.