Beyond audio and video retrieval: towards multimedia summarization

Authors:
Duo Ding;Florian Metze;Shourabh Rawat;Peter Franz Schulam;Susanne Burger;Ehsan Younessian;Lei Bao;Michael G. Christel;Alexander Hauptmann
Affiliations:
Carnegie Mellon University, Pittsburgh, PA;Carnegie Mellon University, Pittsburgh, PA;Carnegie Mellon University, Pittsburgh, PA;Carnegie Mellon University, Pittsburgh, PA;Carnegie Mellon University, Pittsburgh, PA;Carnegie Mellon University, Pittsburgh, PA;Carnegie Mellon University, Pittsburgh, PA;Carnegie Mellon University, Pittsburgh, PA;Carnegie Mellon University, Pittsburgh, PA
Venue:
Proceedings of the 2nd ACM International Conference on Multimedia Retrieval
Year:
2012

Citing 7
Cited 2

Video abstraction: A systematic review and classification

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
Clever clustering vs. simple speed-up for summarizing rushes

Proceedings of the international workshop on TRECVID video summarization
Automated Metadata in Multimedia Information Systems: Creation, Refinement, Use in Surrogates, and Evaluation

Automated Metadata in Multimedia Information Systems: Creation, Refinement, Use in Surrogates, and Evaluation
Multimedia surrogates for video gisting: Toward combining spoken words and imagery

Information Processing and Management: an International Journal
Explicit and implicit concept-based video retrieval with bipartite graph propagation model

Proceedings of the international conference on Multimedia
Towards textually describing complex video contents with audio-visual concept classifiers

MM '11 Proceedings of the 19th ACM international conference on Multimedia
Understanding images with natural sentences

MM '11 Proceedings of the 19th ACM international conference on Multimedia

Generating natural language summaries for multimedia

INLG '12 Proceedings of the Seventh International Natural Language Generation Conference
We are not equally negative: fine-grained labeling for multimedia event detection

Proceedings of the 21st ACM international conference on Multimedia

Quantified Score

Hi-index	0.00

Visualization

Abstract

Given the deluge of multimedia content that is becoming available over the Internet, it is increasingly important to be able to effectively examine and organize these large stores of information in ways that go beyond browsing or collaborative filtering. In this paper we review previous work on audio and video processing, and define the task of Topic-Oriented Multimedia Summarization (TOMS) using natural language generation: given a set of automatically extracted features from a video (such as visual concepts and ASR transcripts) a TOMS system will automatically generate a paragraph of natural language ("a recounting"), which summarizes the important information in a video belonging to a certain topic area, and provides explanations for why a video was matched and retrieved. We see this as a first step towards systems that will be able to discriminate visually similar, but semantically different videos, compare two videos and provide textual output or summarize a large number of videos at once. In this paper, we introduce our approach of solving the TOMS problem. We extract visual concept features and ASR transcription features from a given video, and develop a template-based natural language generation system to produce a textual recounting based on the extracted features. We also propose possible experimental designs for continuously evaluating and improving TOMS systems, and present results of a pilot evaluation of our initial system.