Instructional Video Content Analysis Using Audio Information

  • Authors:
  • Ying Li;C. Dorai

  • Affiliations:
  • IBM Thomas J. Watson Res. Center, Yorktown Heights, NY;-

  • Venue:
  • IEEE Transactions on Audio, Speech, and Language Processing
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Automatic media content analysis and understanding for efficient topic searching and browsing are current challenges in the management of e-learning content repositories. This paper presents our current work on analyzing and structuralizing instructional videos using pure audio information. Specifically, an audio classification scheme is first developed to partition the sound-track of an instructional video into homogeneous audio segments where each segment has a unique sound type such as speech or music. We then apply a statistical approach to extract discussion scenes in the video by modeling the instructor with a Gaussian mixture model (GMM) and updating it on the fly. Finally, we categorize obtained discussion scenes into either two-speaker or multispeaker discussions using an adaptive mode-based clustering approach. Experiments carried out on four training videos and five IBM MicroMBA class videos have yielded encouraging results. It is our belief that by detecting and identifying various types of discussions, we are able to better understand and annotate the learning media content and subsequently facilitate its content access, browsing, and retrieval