Optimizing training set construction for video semantic classification

Authors:
Jinhui Tang;Xian-Sheng Hua;Yan Song;Tao Mei;Xiuqing Wu
Affiliations:
Department of Electronic Engineering and Information Science, University of Science and Technology of China, Hefei, China;Microsoft Research Asia, Beijing, China;Department of Electronic Engineering and Information Science, University of Science and Technology of China, Hefei, China;Microsoft Research Asia, Beijing, China;Department of Electronic Engineering and Information Science, University of Science and Technology of China, Hefei, China
Venue:
EURASIP Journal on Advances in Signal Processing
Year:
2008

Citing 15
Cited 1

Three remarks on the support vector method of function estimation

Advances in kernel methods
Automatically Labeling Video Data Using Multi-class Active Learning

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Semantic video classification by integrating flexible mixture model with adaptive EM algorithm

MIR '03 Proceedings of the 5th ACM SIGMM international workshop on Multimedia information retrieval
Structure analysis of soccer video with domain knowledge and hidden Markov models

Pattern Recognition Letters - Video computing
An online-optimized incremental learning framework for video semantic classification

Proceedings of the 12th annual ACM international conference on Multimedia
Multimodal concept-dependent active learning for image retrieval

Proceedings of the 12th annual ACM international conference on Multimedia
Semi-Supervised Cross Feature Learning for Semantic Concept Detection in Videos

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 1 - Volume 01
Putting active learning into multimedia applications: dynamic definition and refinement of concept classifiers

Proceedings of the 13th annual ACM international conference on Multimedia
Semi-automatic video annotation based on active learning with multiple complementary predictors

Proceedings of the 7th ACM SIGMM international workshop on Multimedia information retrieval
Video Annotation by Active Learning and Cluster Tuning

CVPRW '06 Proceedings of the 2006 Conference on Computer Vision and Pattern Recognition Workshop
Learning concepts from large scale imbalanced data sets using support cluster machines

MULTIMEDIA '06 Proceedings of the 14th annual ACM international conference on Multimedia
Automatic video annotation by semi-supervised learning with kernel density estimation

MULTIMEDIA '06 Proceedings of the 14th annual ACM international conference on Multimedia
Structure-sensitive manifold ranking for video concept detection

Proceedings of the 15th international conference on Multimedia
LIBSVM: A library for support vector machines

ACM Transactions on Intelligent Systems and Technology (TIST)
Relevance feedback: a power tool for interactive content-based image retrieval

IEEE Transactions on Circuits and Systems for Video Technology

Video semantic analysis based on structure-sensitive anisotropic manifold ranking

Signal Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

We exploit the criteria to optimize training set construction for the large-scale video semantic classification. Due to the large gap between low-level features and higher-level semantics, as well as the high diversity of video data, it is difficult to represent the prototypes of semantic concepts by a training set of limited size. In video semantic classification, most of the learning-based approaches require a large training set to achieve good generalization capacity, in which large amounts of labor-intensive manual labeling are ineluctable. However, it is observed that the generalization capacity of a classifier highly depends on the geometrical distribution of the training data rather than the size. We argue that a training set which includes most temporal and spatial distribution information of the whole data will achieve a good performance even if the size of training set is limited. In order to capture the geometrical distribution characteristics of a given video collection, we propose four metrics for constructing/selecting an optimal training set, including salience, temporal dispersiveness, spatial dispersiveness, and diversity. Furthermore, based on these metrics, we propose a set of optimization rules to capture the most distribution information of the whole data using a training set with a given size. Experimental results demonstrate these rules are effective for training set construction in video semantic classification, and significantly outperform random training set selection.