A detection-based approach to broadcast news video story segmentation

Authors:
Chengyuan Ma;Byungki Byun; Ilseo Kim;Chin-Hui Lee
Affiliations:
School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, 30332, USA;School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, 30332, USA;School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, 30332, USA;School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, 30332, USA
Venue:
ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
Year:
2009

Citing 0
Cited 2

Speaker role recognition to help spontaneous conversational speech detection

Proceedings of the 2010 international workshop on Searching spontaneous conversational speech
Hierarchical framework for plot de-interlacing of TV series based on speakers, dialogues and images

Proceedings of the 2012 ACM international workshop on Audio and multimedia methods for large-scale video analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

A detection-based paradigm decomposes a complex system into small pieces, solves each subproblem one by one, and combines the collected evidence to obtain a final solution. In this study of video story segmentation, a set of key events are first detected from heterogeneous multimedia signal sources, including a large scale concept ontology for images, text generated from automatic speech recognition systems, features extracted from audio track, and high-level video transcriptions. Then a discriminative evidence fusion scheme is investigated. We use the maximum figure-of-merit learning approach to directly optimize the performance metrics used in system evaluation, such as precision, recall, and F1 measure. Some experimental evaluations conducted on the TRECVID 2003 dataset demonstrate the effectiveness of the proposed detection-based paradigm. The proposed framework facilitates flexible combination and extensions of event detector design and evidence fusion to enable other related video applications.