Optimizing training set construction for video semantic classification

  • Authors:
  • Jinhui Tang;Xian-Sheng Hua;Yan Song;Tao Mei;Xiuqing Wu

  • Affiliations:
  • Department of Electronic Engineering and Information Science, University of Science and Technology of China, Hefei, China;Microsoft Research Asia, Beijing, China;Department of Electronic Engineering and Information Science, University of Science and Technology of China, Hefei, China;Microsoft Research Asia, Beijing, China;Department of Electronic Engineering and Information Science, University of Science and Technology of China, Hefei, China

  • Venue:
  • EURASIP Journal on Advances in Signal Processing
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

We exploit the criteria to optimize training set construction for the large-scale video semantic classification. Due to the large gap between low-level features and higher-level semantics, as well as the high diversity of video data, it is difficult to represent the prototypes of semantic concepts by a training set of limited size. In video semantic classification, most of the learning-based approaches require a large training set to achieve good generalization capacity, in which large amounts of labor-intensive manual labeling are ineluctable. However, it is observed that the generalization capacity of a classifier highly depends on the geometrical distribution of the training data rather than the size. We argue that a training set which includes most temporal and spatial distribution information of the whole data will achieve a good performance even if the size of training set is limited. In order to capture the geometrical distribution characteristics of a given video collection, we propose four metrics for constructing/selecting an optimal training set, including salience, temporal dispersiveness, spatial dispersiveness, and diversity. Furthermore, based on these metrics, we propose a set of optimization rules to capture the most distribution information of the whole data using a training set with a given size. Experimental results demonstrate these rules are effective for training set construction in video semantic classification, and significantly outperform random training set selection.