Annotation-based multimedia summarization and translation

Authors:
Katashi Nagao;Shigeki Ohira;Mitsuhiro Yoneoka
Affiliations:
Nagoya University and CREST, JST, Furo-cho, Chikusa-ku, Nagoya, Japan;Waseda University, Shinjuku-ku, Tokyo, Japan;Tokyo Institute of Technology, Meguro-ku, Tokyo, Japan
Venue:
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Year:
2002

Citing 2
Cited 5

Automatic text summarization based on the Global Document Annotation

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
An annotation system for enhancing quality of natural language processing

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 2

Collaborative Video Scene Annotation Based on Tag Cloud

PCM '08 Proceedings of the 9th Pacific Rim Conference on Multimedia: Advances in Multimedia Information Processing
A novel video summarization based on mining the story-structure and semantic relations among concept entities

IEEE Transactions on Multimedia - Special issue on integration of context and content
Video scene retrieval using online video annotation

JSAI'07 Proceedings of the 2007 conference on New frontiers in artificial intelligence
Discussion mining: knowledge discovery from semantically annotated discussion content

JSAI'03/JSAI04 Proceedings of the 2003 and 2004 international conference on New frontiers in artificial intelligence
Discussion mining: annotation-based knowledge discovery from real world activities

PCM'04 Proceedings of the 5th Pacific Rim conference on Advances in Multimedia Information Processing - Volume Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents techniques for multimedia annotation and their application to video summarization and translation. Our tool for annotation allows users to easily create annotation including voice transcripts, video scene descriptions, and visual/auditory object descriptions. The module for voice transcription is capable of multilingual spoken language identification and recognition. A video scene description consists of semi-automatically detected keyframes of each scene in a video clip and time codes of scenes. A visual object description is created by tracking and interactive naming of people and objects in video scenes. The text data in the multimedia annotation are syntactically and semantically structured using linguistic annotation. The proposed multimedia summarization works upon a multimodal document that consists of a video, keyframes of scenes, and transcripts of the scenes. The multimedia translation automatically generates several versions of multimedia content in different languages.