Augmented segmentation and visualization for presentation videos

Authors:
Alexander Haubold;John R. Kender
Affiliations:
Columbia University, New York, NY;Columbia University, New York, NY
Venue:
Proceedings of the 13th annual ACM international conference on Multimedia
Year:
2005

Citing 11
Cited 11

Teaching and learning as multimedia authoring: the classroom 2000 project

MULTIMEDIA '96 Proceedings of the fourth ACM international conference on Multimedia
CueVideo (demonstration abstract): automated video/audio indexing and browsing

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Passive capture and structuring of lectures

MULTIMEDIA '99 Proceedings of the seventh ACM international conference on Multimedia (Part 1)
Auto-summarization of audio-video presentations

MULTIMEDIA '99 Proceedings of the seventh ACM international conference on Multimedia (Part 1)
Topic segmentation with an aspect hidden Markov model

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Automatic discovery of salient segments in imperfect speech transcripts

Proceedings of the tenth international conference on Information and knowledge management
Hidden Markov models for modeling and recognizing gesture under variation

Hidden Markov models
Audio Partitioning and Transcription for Broadcast Data Indexation

Multimedia Tools and Applications
Video Skimming and Characterization through the Combination of Image and Language Understanding

CAIVD '98 Proceedings of the 1998 International Workshop on Content-Based Access of Image and Video Databases (CAIVD '98)
Analysis and Visualization of Index Words from Audio Transcripts of Instructional Videos

ISMSE '04 Proceedings of the IEEE Sixth International Symposium on Multimedia Software Engineering
Analysis and interface for instructional video

ICME '03 Proceedings of the 2003 International Conference on Multimedia and Expo - Volume 1

VAST MM: multimedia browser for presentation video

Proceedings of the 6th ACM international conference on Image and video retrieval
Towards to an automatic semantic annotation for multimedia learning objects

Proceedings of the international workshop on Educational multimedia and multimedia education
Video summarisation: A conceptual framework and survey of the state of the art

Journal of Visual Communication and Image Representation
Question answering from lecture videos based on an automatic semantic annotation

Proceedings of the 13th annual conference on Innovation and technology in computer science education
Hierarchical fuzzy feature similarity combination for presentation slide retrieval

EURASIP Journal on Advances in Signal Processing
Audio-based classification of speaker characteristics

ICME'09 Proceedings of the 2009 IEEE international conference on Multimedia and Expo
Segmentation and annotation of audiovisual recordings based on automated speech recognition

IDEAL'07 Proceedings of the 8th international conference on Intelligent data engineering and automated learning
TalkMiner: a lecture webcast search engine

Proceedings of the international conference on Multimedia
An automated analysis and indexing framework for lecture video portal

ICWL'12 Proceedings of the 11th international conference on Advances in Web-Based Learning
Advanced Mobile Lecture Viewing: Summarization and Two-Way Navigation

International Journal of Handheld Computing Research
Treemaps to visualise and navigate speech audio

Proceedings of the 25th Australian Computer-Human Interaction Conference: Augmentation, Application, Innovation, Collaboration

Quantified Score

Hi-index	0.01

Visualization

Abstract

We investigate methods of segmenting, visualizing, and indexing presentation videos by both audio and visual data. The audio track is segmented by speaker, and augmented with key phrases which are extracted using an Automatic Speech Recognizer (ASR). The video track is segmented by visual dissimilarities and changes in speaker gesturing, and augmented by representative key frames. An interactive user interface combines a visual representation of audio, video, text, key frames, and allows the user to navigate presentation videos. User studies with 176 students of varying knowledge were conducted on 7.5 hours of student presentation video (32 presentations). Tasks included searching for various portions of presentations, both known and unknown to students, and summarizing presentations given the annotations. The results are favorable towards the video summaries and the interface, suggesting faster responses by a factor of 20% compared to having access to the actual video. Accuracy of responses remained the same on average. Follow-up surveys present a number of suggestions towards improving the interface, such as the incorporation of automatic speaker clustering and identification, and the display of an abstract topological view of the presentation. Surveys also show alternative contexts in which students would like to use the tool in the classroom environment.