Augmented segmentation and visualization for presentation videos

  • Authors:
  • Alexander Haubold;John R. Kender

  • Affiliations:
  • Columbia University, New York, NY;Columbia University, New York, NY

  • Venue:
  • Proceedings of the 13th annual ACM international conference on Multimedia
  • Year:
  • 2005

Quantified Score

Hi-index 0.01

Visualization

Abstract

We investigate methods of segmenting, visualizing, and indexing presentation videos by both audio and visual data. The audio track is segmented by speaker, and augmented with key phrases which are extracted using an Automatic Speech Recognizer (ASR). The video track is segmented by visual dissimilarities and changes in speaker gesturing, and augmented by representative key frames. An interactive user interface combines a visual representation of audio, video, text, key frames, and allows the user to navigate presentation videos. User studies with 176 students of varying knowledge were conducted on 7.5 hours of student presentation video (32 presentations). Tasks included searching for various portions of presentations, both known and unknown to students, and summarizing presentations given the annotations. The results are favorable towards the video summaries and the interface, suggesting faster responses by a factor of 20% compared to having access to the actual video. Accuracy of responses remained the same on average. Follow-up surveys present a number of suggestions towards improving the interface, such as the incorporation of automatic speaker clustering and identification, and the display of an abstract topological view of the presentation. Surveys also show alternative contexts in which students would like to use the tool in the classroom environment.