Generating natural language summaries for multimedia

Authors:
Duo Ding;Florian Metze;Shourabh Rawat;Peter F. Schulam;Susanne Burger
Affiliations:
Carnegie Mellon University, Pittsburgh, PA;Carnegie Mellon University, Pittsburgh, PA;Carnegie Mellon University, Pittsburgh, PA;Carnegie Mellon University, Pittsburgh, PA;Carnegie Mellon University, Pittsburgh, PA
Venue:
INLG '12 Proceedings of the Seventh International Natural Language Generation Conference
Year:
2012

Citing 5
Cited 0

Video abstraction: A systematic review and classification

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
Multimedia surrogates for video gisting: Toward combining spoken words and imagery

Information Processing and Management: an International Journal
SimpleNLG: a realisation engine for practical applications

ENLG '09 Proceedings of the 12th European Workshop on Natural Language Generation
Towards textually describing complex video contents with audio-visual concept classifiers

MM '11 Proceedings of the 19th ACM international conference on Multimedia
Beyond audio and video retrieval: towards multimedia summarization

Proceedings of the 2nd ACM International Conference on Multimedia Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we introduce an automatic system that generates textual summaries of Internet-style video clips by first identifying suitable high-level descriptive features that have been detected in the video (e.g. visual concepts, recognized speech, actions, objects, persons, etc.). Then a natural language generator is constructed using SimpleNLG to compile the high-level features into a textual form. The generated summary contains information from both visual and acoustic sources, intending to give a general review and summary of the video. To reduce the complexity of the task, we restrict ourselves to work with videos that show a limited number of "events". In this demo paper, we describe the design of the system and present example outputs generated by the video summarization system.