Improving cluster selection and event modeling in unsupervised mining for automatic audiovisual video structuring

Authors:
Anh-Phuong Ta;Mathieu Ben;Guillaume Gravier
Affiliations:
INRIA-Rennes, Rennes, Cedex, France;Powedia, Rennes, Cedex, France;CNRS-IRISA, Rennes, Cedex, France
Venue:
MMM'12 Proceedings of the 18th international conference on Advances in Multimedia Modeling
Year:
2012

Citing 8
Cited 0

Story boundary detection in large broadcast news video archives: techniques, experience and trends

Proceedings of the 12th annual ACM international conference on Multimedia
A Generic Framework for Semantic Sports Video Analysis Using Dynamic Bayesian Networks

MMM '05 Proceedings of the 11th International Multimedia Modelling Conference
Detecting Ads in Video Streams Using Acoustic and Visual Cues

Computer
Unsupervised clustering of ambulatory audio and video

ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 06
Using self-similarity matrices for structure mining on news video

SETN'06 Proceedings of the 4th Helenic conference on Advances in Artificial Intelligence
Unsupervised mining of audiovisually consistent segments in videos with application to structure analysis

ICME '11 Proceedings of the 2011 IEEE International Conference on Multimedia and Expo
ARGOS: automatically extracting repeating objects from multimedia streams

IEEE Transactions on Multimedia
Efficient Short Video Repeat Identification With Application to News Video Structure Analysis

IEEE Transactions on Multimedia

Quantified Score

Hi-index	0.00

Visualization

Abstract

Can we discover audio-visually consistent events from videos in a totally unsupervised manner? And, how to mine videos with different genres? In this paper we present our new results in automatically discovering audio-visual events. A new measure is proposed to select audio-visually consistent elements from the two dendrograms respectively representing hierarchical clustering results for the audio and visual modalities. Each selected element corresponds to a candidate event. In order to construct a model for each event, each candidate event is represented as a group of clusters, and a voting mechanism is applied to select training examples for discriminative classifiers. Finally, the trained model is tested on the entire video to select video segments that belong to the event discovered. Experimental results on different and challenging genres of videos, show the effectiveness of our approach.