Multimodal data fusion for video scene segmentation

  • Authors:
  • Vyacheslav Parshin;Aliaksandr Paradzinets;Liming Chen

  • Affiliations:
  • LIRIS, Ecole Centrale de Lyon, Ecully, France;LIRIS, Ecole Centrale de Lyon, Ecully, France;LIRIS, Ecole Centrale de Lyon, Ecully, France

  • Venue:
  • VISUAL'05 Proceedings of the 8th international conference on Visual Information and Information Systems
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Automatic video segmentation into semantic units is important to organize an effective content based access to long video. The basic building blocks of professional video are shots. However the semantic meaning they provide is of a too low level. In this paper we focus on the problem of video segmentation into more meaningful high-level narrative units called scenes – aggregates of shots that are temporally continuous, share the same physical settings or represent continuous ongoing action. A statistical video scene segmentation framework is proposed which is capable to combine multiple mid-level features in a symmetrical and scalable manner. Two kinds of such features extracted in visual and audio domain are suggested. The results of experimental evaluations carried out on ground truth video are reported. They show that our algorithm effectively fuses multiple modalities with higher performance as compared with an alternative conventional fusion technique.