Use of adaptive still image descriptors for annotation of video frames

Authors:
Andrea Kutics;Akihiko Nakagawa;Kazuhiro Shindoh
Affiliations:
Tokyo University of Technology, Tokyo, Japan;University of Electro-Communications, Tokyo, Japan;-
Venue:
ICIAR'07 Proceedings of the 4th international conference on Image Analysis and Recognition
Year:
2007

Citing 11
Cited 1

The nature of statistical learning theory

The nature of statistical learning theory
Unifying Keywords and Visual Contents in Image Retrieval

IEEE MultiMedia
Learning-based linguistic indexing of pictures with 2--d MHMMs

Proceedings of the tenth ACM international conference on Multimedia
Image Indexing Using Color Correlograms

CVPR '97 Proceedings of the 1997 Conference on Computer Vision and Pattern Recognition (CVPR '97)
Matching words and pictures

The Journal of Machine Learning Research
Video Annotation by Active Learning and Cluster Tuning

CVPRW '06 Proceedings of the 2006 Conference on Computer Vision and Pattern Recognition Workshop
Large-Scale Concept Ontology for Multimedia

IEEE MultiMedia
MultiTube--Where Web 2.0 and Multimedia Could Meet

IEEE MultiMedia
Home Photo Content Modeling for Personalized Event-Based Retrieval

IEEE MultiMedia
Feature generation for text categorization using world knowledge

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Color and texture descriptors

IEEE Transactions on Circuits and Systems for Video Technology

MOSIR: image and segment-based retrieval for mobile phones

PCM'10 Proceedings of the Advances in multimedia information processing, and 11th Pacific Rim conference on Multimedia: Part II

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a novel method for annotating videos taken from the TRECVID 2005 data using only static visual features and metadata of still image frames. The method is designed to provide the user with annotation or tagging tools to incorporate multimedia data such as video or still images as well as text into searching or other combined applications running either on the web or on other networks. It mainly uses MPEG-7-based visual features and metadata of prototype images and allows the user to select either a prototype or a training set. It also adaptively adjusts the weights of the visual features the user finds most adequate to bridge the semantic gap. The user can also detect relevant regions in video frames by using a self-developed segmentation tool and can carry out region-based annotation with the same video frame set. The method provides satisfactory results even when the annotations of the TRECVID 2005 video data greatly vary considering the semantic level of concepts. It is simple and fast, using a very small set of training data and little or no user intervention. It also has the advantage that it can be applied to any combination of visual and textual features.