To construct optimal training set for video annotation

Authors:
Jinhui Tang;Yan Song;Xian-Sheng Hua;Tao Mei;Xiuqing Wu
Affiliations:
University of Science and Technology of China;University of Science and Technology of China;Microsoft Research Asia;Microsoft Research Asia;University of Science and Technology of China
Venue:
MULTIMEDIA '06 Proceedings of the 14th annual ACM international conference on Multimedia
Year:
2006

Citing 6
Cited 3

Three remarks on the support vector method of function estimation

Advances in kernel methods
Automatically Labeling Video Data Using Multi-class Active Learning

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Semantic video classification by integrating flexible mixture model with adaptive EM algorithm

MIR '03 Proceedings of the 5th ACM SIGMM international workshop on Multimedia information retrieval
Multimodal concept-dependent active learning for image retrieval

Proceedings of the 12th annual ACM international conference on Multimedia
Semi-Supervised Cross Feature Learning for Semantic Concept Detection in Videos

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 1 - Volume 01
Semi-automatic video annotation based on active learning with multiple complementary predictors

Proceedings of the 7th ACM SIGMM international workshop on Multimedia information retrieval

Online multi-label active annotation: towards large-scale content-based video search

MM '08 Proceedings of the 16th ACM international conference on Multimedia
NUS-WIDE: a real-world web image database from National University of Singapore

Proceedings of the ACM International Conference on Image and Video Retrieval
A novel method for semantic video concept learning using web images

MM '11 Proceedings of the 19th ACM international conference on Multimedia

Quantified Score

Hi-index	0.01

Visualization

Abstract

This paper exploits the criteria to optimize the training set construction for video annotation. Most existing learning-based semantic annotation approaches require a large training set to achieve good generalization capacity, in which a considerable amount of labor-intensively manual labeling is desirable. However, it is observed that the generalization capacity of a classifier highly depends on the geometrical distribution rather than the size of the training data. We argue that a training set which includes most temporal and spatial distribution of the whole data will achieve a satisfying performance even in the case of limited size of training set. In order to capture the geometrical distribution characteristics of a given video collection, we propose the following four metrics for constructing an optimal training set, including Salience Time Dispersiveness Spatial Dispersiveness and Diversity. Moreover, based on these metrics, we propose a set of optimization rules to capture the most distribution information of the whole data for a training set with a given size. Experimental results demonstrate that these rules are effective for training set construction for video annotation, and significantly outperform random training set selection as well.