Speaker role recognition to help spontaneous conversational speech detection
Proceedings of the 2010 international workshop on Searching spontaneous conversational speech
Hierarchical framework for plot de-interlacing of TV series based on speakers, dialogues and images
Proceedings of the 2012 ACM international workshop on Audio and multimedia methods for large-scale video analysis
Hi-index | 0.00 |
A detection-based paradigm decomposes a complex system into small pieces, solves each subproblem one by one, and combines the collected evidence to obtain a final solution. In this study of video story segmentation, a set of key events are first detected from heterogeneous multimedia signal sources, including a large scale concept ontology for images, text generated from automatic speech recognition systems, features extracted from audio track, and high-level video transcriptions. Then a discriminative evidence fusion scheme is investigated. We use the maximum figure-of-merit learning approach to directly optimize the performance metrics used in system evaluation, such as precision, recall, and F1 measure. Some experimental evaluations conducted on the TRECVID 2003 dataset demonstrate the effectiveness of the proposed detection-based paradigm. The proposed framework facilitates flexible combination and extensions of event detector design and evidence fusion to enable other related video applications.