Instructional Video Content Analysis Using Audio Information

Authors:
Ying Li;C. Dorai
Affiliations:
IBM Thomas J. Watson Res. Center, Yorktown Heights, NY;-
Venue:
IEEE Transactions on Audio, Speech, and Language Processing
Year:
2006

Citing 0
Cited 5

Semantic-event based analysis and segmentation of wedding ceremony videos

Proceedings of the international workshop on Workshop on multimedia information retrieval
Efficient advertisement discovery for audio podcast content using candidate segmentation

EURASIP Journal on Audio, Speech, and Music Processing
Digital learning video indexing using scene detection

ICHL'11 Proceedings of the 4th international conference on Hybrid learning
Comparing the one-vs-one and one-vs-all methods in benthic macroinvertebrate image classification

MLDM'11 Proceedings of the 7th international conference on Machine learning and data mining in pattern recognition
Sound recognition in mixtures

LVA/ICA'12 Proceedings of the 10th international conference on Latent Variable Analysis and Signal Separation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Automatic media content analysis and understanding for efficient topic searching and browsing are current challenges in the management of e-learning content repositories. This paper presents our current work on analyzing and structuralizing instructional videos using pure audio information. Specifically, an audio classification scheme is first developed to partition the sound-track of an instructional video into homogeneous audio segments where each segment has a unique sound type such as speech or music. We then apply a statistical approach to extract discussion scenes in the video by modeling the instructor with a Gaussian mixture model (GMM) and updating it on the fly. Finally, we categorize obtained discussion scenes into either two-speaker or multispeaker discussions using an adaptive mode-based clustering approach. Experiments carried out on four training videos and five IBM MicroMBA class videos have yielded encouraging results. It is our belief that by detecting and identifying various types of discussions, we are able to better understand and annotate the learning media content and subsequently facilitate its content access, browsing, and retrieval