Turning lectures into comic books using linguistically salient gestures

Authors:
Jacob Eisenstein;Regina Barzilay;Randall Davis
Affiliations:
Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA;Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA;Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA
Venue:
AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 1
Year:
2007

Citing 11
Cited 1

On the limited memory BFGS method for large scale optimization

Mathematical Programming: Series A and B
Video Manga: generating semantically meaningful video summaries

MULTIMEDIA '99 Proceedings of the seventh ACM international conference on Multimedia (Part 1)
Video skimming and characterization through the combination of image and language understanding techniques

Readings in multimedia computing and networking
Advances in Automatic Text Summarization

Advances in Automatic Text Summarization
A user attention model for video summarization

Proceedings of the tenth ACM international conference on Multimedia
Cognitive Status and Form of Reference in Multimodal Human-Computer Interaction

Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence
Temporal Classification of Natural Gesture and Application to Video Coding

CVPR '97 Proceedings of the 1997 Conference on Computer Vision and Pattern Recognition (CVPR '97)
Analysis of Gesture and Action in Technical Talks for Video Indexing

CVPR '97 Proceedings of the 1997 Conference on Computer Vision and Pattern Recognition (CVPR '97)
On coreference resolution performance metrics

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
A large-scale exploration of effective global features for a joint entity detection and tracking model

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Gesture improves coreference resolution

NAACL-Short '06 Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers

Gesture salience as a hidden variable for coreference resolution and keyframe extraction

Journal of Artificial Intelligence Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

Creating video recordings of events such as lectures or meetings is increasingly inexpensive and easy. However, reviewing the content of such video may be time-consuming and difficult. Our goal is to produce a "comic book" summary, in which a transcript is augmented with keyframes that disambiguate and clarify accompanying text. Unlike most previous keyframe extraction systems which rely primarily on visual cues, we present a linguistically-motivated approach that selects keyframes that contain salient gestures. Rather than learning gesture salience directly, it is estimated by measuring the contribution of gesture to understanding other discourse phenomena. More specifically, we bootstrap from multimodal coreference resolution to identify gestures that improve performance. We then select keyframes that capture these gestures. Our model predicts gesture salience as a hidden variable in a conditional framework, with observable features from both the visual and textual modalities. This approach significantly outperforms competitive baselines that do not use gesture information.