The nature of statistical learning theory
The nature of statistical learning theory
Unifying Keywords and Visual Contents in Image Retrieval
IEEE MultiMedia
Learning-based linguistic indexing of pictures with 2--d MHMMs
Proceedings of the tenth ACM international conference on Multimedia
Image Indexing Using Color Correlograms
CVPR '97 Proceedings of the 1997 Conference on Computer Vision and Pattern Recognition (CVPR '97)
The Journal of Machine Learning Research
Video Annotation by Active Learning and Cluster Tuning
CVPRW '06 Proceedings of the 2006 Conference on Computer Vision and Pattern Recognition Workshop
Large-Scale Concept Ontology for Multimedia
IEEE MultiMedia
MultiTube--Where Web 2.0 and Multimedia Could Meet
IEEE MultiMedia
Feature generation for text categorization using world knowledge
IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
IEEE Transactions on Circuits and Systems for Video Technology
MOSIR: image and segment-based retrieval for mobile phones
PCM'10 Proceedings of the Advances in multimedia information processing, and 11th Pacific Rim conference on Multimedia: Part II
Hi-index | 0.00 |
This paper presents a novel method for annotating videos taken from the TRECVID 2005 data using only static visual features and metadata of still image frames. The method is designed to provide the user with annotation or tagging tools to incorporate multimedia data such as video or still images as well as text into searching or other combined applications running either on the web or on other networks. It mainly uses MPEG-7-based visual features and metadata of prototype images and allows the user to select either a prototype or a training set. It also adaptively adjusts the weights of the visual features the user finds most adequate to bridge the semantic gap. The user can also detect relevant regions in video frames by using a self-developed segmentation tool and can carry out region-based annotation with the same video frame set. The method provides satisfactory results even when the annotations of the TRECVID 2005 video data greatly vary considering the semantic level of concepts. It is simple and fast, using a very small set of training data and little or no user intervention. It also has the advantage that it can be applied to any combination of visual and textual features.