Finding presentations in recorded meetings using audio and video features

Authors:
J. Foote;J. Boreczsky;L. Wilcox
Affiliations:
FX Palo Alto Lab., CA, USA;-;-
Venue:
ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 06
Year:
1999

Citing 0
Cited 3

Video Manga: generating semantically meaningful video summaries

MULTIMEDIA '99 Proceedings of the seventh ACM international conference on Multimedia (Part 1)
Knowledge management technology

IBM Systems Journal
On supervision and statistical learning for semantic multimedia analysis

Journal of Visual Communication and Image Representation

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes a method for finding segments in video-recorded meetings that correspond to presentations. These segments serve as indexes into the recorded meeting. The system automatically detects intervals of video that correspond to presentation slides. We assume that only one person speaks during an interval when slides are detected. Thus these intervals can be used as training data for a speaker spotting system. An HMM is automatically constructed and trained on the audio data from each slide interval. A Viterbi alignment then resegments the audio according to speaker. Since the same speaker may talk across multiple slide intervals the acoustic data from these intervals is clustered to yield an estimate of the number of distinct speakers and their order. This allows the individual presentations in the video to be identified from the location of each presenter's speech. Results are presented for a corpus of six meeting videos.