Multimodal data fusion for video scene segmentation

Authors:
Vyacheslav Parshin;Aliaksandr Paradzinets;Liming Chen
Affiliations:
LIRIS, Ecole Centrale de Lyon, Ecully, France;LIRIS, Ecole Centrale de Lyon, Ecully, France;LIRIS, Ecole Centrale de Lyon, Ecully, France
Venue:
VISUAL'05 Proceedings of the 8th international conference on Visual Information and Information Systems
Year:
2005

Citing 9
Cited 1

Second-order statistical measures for text-independent speaker identification

Speech Communication
Medium knowledge-based macro-segmentation of video into sequences

Intelligent multimedia information retrieval
Determining computable scenes in films and their structures using audio-visual memory models

MULTIMEDIA '00 Proceedings of the eighth ACM international conference on Multimedia
A New Shot Boundary Detection Algorithm

PCM '01 Proceedings of the Second IEEE Pacific Rim Conference on Multimedia: Advances in Multimedia Information Processing
Video Scene Segmentation via Continuous Video Coherence

CVPR '98 Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
Time-Constrained Clustering for Segmentation of Video into Story Unites

ICPR '96 Proceedings of the International Conference on Pattern Recognition (ICPR '96) Volume III-Volume 7276 - Volume 7276
Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing)

Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing)
Audio-assisted scene segmentation for story browsing

CIVR'03 Proceedings of the 2nd international conference on Image and video retrieval
Shot clustering techniques for story browsing

IEEE Transactions on Multimedia

Multimodal recognition of visual concepts using histograms of textual concepts and selective weighted late fusion scheme

Computer Vision and Image Understanding

Quantified Score

Hi-index	0.00

Visualization

Abstract

Automatic video segmentation into semantic units is important to organize an effective content based access to long video. The basic building blocks of professional video are shots. However the semantic meaning they provide is of a too low level. In this paper we focus on the problem of video segmentation into more meaningful high-level narrative units called scenes – aggregates of shots that are temporally continuous, share the same physical settings or represent continuous ongoing action. A statistical video scene segmentation framework is proposed which is capable to combine multiple mid-level features in a symmetrical and scalable manner. Two kinds of such features extracted in visual and audio domain are suggested. The results of experimental evaluations carried out on ground truth video are reported. They show that our algorithm effectively fuses multiple modalities with higher performance as compared with an alternative conventional fusion technique.