Video Manga: generating semantically meaningful video summaries
MULTIMEDIA '99 Proceedings of the seventh ACM international conference on Multimedia (Part 1)
Knowledge management technology
IBM Systems Journal
On supervision and statistical learning for semantic multimedia analysis
Journal of Visual Communication and Image Representation
Hi-index | 0.00 |
This paper describes a method for finding segments in video-recorded meetings that correspond to presentations. These segments serve as indexes into the recorded meeting. The system automatically detects intervals of video that correspond to presentation slides. We assume that only one person speaks during an interval when slides are detected. Thus these intervals can be used as training data for a speaker spotting system. An HMM is automatically constructed and trained on the audio data from each slide interval. A Viterbi alignment then resegments the audio according to speaker. Since the same speaker may talk across multiple slide intervals the acoustic data from these intervals is clustered to yield an estimate of the number of distinct speakers and their order. This allows the individual presentations in the video to be identified from the location of each presenter's speech. Results are presented for a corpus of six meeting videos.