Extracting salient keywords from instructional videos using joint text, audio and visual cues

Authors:
Youngja Park;Ying Li
Affiliations:
IBM T.J. Watson Research Center, Hawthorne, NY;IBM T.J. Watson Research Center, Hawthorne, NY
Venue:
NAACL-Short '06 Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers
Year:
2006

Citing 4
Cited 2

Term-weighting approaches in automatic text retrieval

Information Processing and Management: an International Journal
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
Video Skimming and Characterization through the Combination of Image and Language Understanding Techniques

CVPR '97 Proceedings of the 1997 Conference on Computer Vision and Pattern Recognition (CVPR '97)
Automatic glossary extraction: beyond terminology identification

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1

Automatic live tagging of videos using chronicles

Proceedings of the 2008 Ambi-Sys workshop on Ambient media delivery and interactive television
Enhancing TV programmes with additional contents using MPEG-7 segmentation information

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	0.01

Visualization

Abstract

This paper presents a multi-modal feature-based system for extracting salient keywords from transcripts of instructional videos. Specifically, we propose to extract domain-specific keywords for videos by integrating various cues from linguistic and statistical knowledge, as well as derived sound classes and characteristic visual content types. The acquisition of such salient keywords will facilitate video indexing and browsing, and significantly improve the quality of current video search engines. Experiments on four government instructional videos show that 82% of the salient keywords appear in the top 50% of the highly ranked keywords. In addition, the audiovisual cues improve precision and recall by 1.1% and 1.5% respectively.