Multi-modal scene segmentation using scene transition graphs

Authors:
Panagiotis Sidiropoulos;Vasileios Mezaris;Ioannis Kompatsiaris;Hugo Meinedo;Isabel Trancoso
Affiliations:
Centre for Research and Technology Hellas (CERTH), Thermi-Thessaloniki, Greece;Centre for Research and Technology Hellas (CERTH), Thermi-Thessaloniki, Greece;Centre for Research and Technology Hellas (CERTH), Thermi-Thessaloniki, Greece;Technical University of Lisbon, Lisbon, Portugal;Technical University of Lisbon, Lisbon, Portugal
Venue:
MM '09 Proceedings of the 17th ACM international conference on Multimedia
Year:
2009

Citing 5
Cited 3

Segmentation of video by clustering and graph analysis

Computer Vision and Image Understanding
Graph Theory With Applications

Graph Theory With Applications
Systematic evaluation of logical story unit segmentation

IEEE Transactions on Multimedia
A Multimodal Scheme for Program Segmentation and Representation in Broadcast Video Streams

IEEE Transactions on Multimedia
Automated high-level movie segmentation for advanced video-retrieval systems

IEEE Transactions on Circuits and Systems for Video Technology

Automatic event-based indexing of multimedia content using a joint content-event model

Proceedings of the 2nd ACM international workshop on Events in multimedia
An unsupervised approach for recurrent tv program structuring

Proceddings of the 9th international interactive conference on Interactive television
Video scene segmentation by improved visual shot coherence

Proceedings of the 19th Brazilian symposium on Multimedia and the web

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this work the problem of automatic decomposition of video into elementary semantic units, known in the literature as scenes, is addressed. Two multi-modal automatic scene segmentation techniques are proposed, both building upon the Scene Transition Graph (STG). In the first of the proposed approaches, speaker diarization results are used for introducing a post-processing step to the STG construction algorithm, with the objective of discarding scene boundaries erroneously identified according to visual-only dissimilarity. In the second approach, speaker diarization and additional audio analysis results are employed and a separate audio-based STG is constructed, in parallel to the original STG based on visual information. The two STGs are subsequently combined. Preliminary results from the application of the proposed techniques to broadcast videos reveal their improved performance over previous approaches.