Reordering video shots for event classification using bag-of-words models and string kernels

  • Authors:
  • Yung-Lun Chen;Shyi-Chyi Cheng;Yi-Ping Phoebe Chen

  • Affiliations:
  • National Taiwan Ocean Univ., Taiwan;National Taiwan Ocean Univ., Taiwan;La Trobe University, Australia

  • Venue:
  • Proceedings of the 27th Conference on Image and Vision Computing New Zealand
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents a novel approach to reorder video shots using the state-of-the-art bag-of-words (BoW) approach. The shot reordering approach eliminates the temporal ambiguity which is likely to degrade the performance of conventional video event recognition algorithms using support vector machine (SVM) classifiers with string kernels. A traditional BoW model constructs feature vectors for video frames, regarding the arrangement of the visual words in the 2D image space, to be histograms of visual words which do not consider spatial-temporal information. Our approach first segments the input video clip into a set of video shots where each of them is further divided into multiple three dimensional video patches and cubes. In this paper we present a method to introduce spatial-temporal information into the BoW model by analytically extracting space-time features from individual 3D cubes. The system learns the BoW codebook from these 3D cubes. Every video shot in an input video sequence is represented as a BoW histogram and the corresponding event is then modelled as a sequence of BoW histograms which are further reordered by the proposed normalization scheme. The string kernels for SVM classification are finally adopted to train the SVM classifiers from a set of training samples. These classifiers are used to recognize the event type of a test video clip. Our framework presents a simple and effective way to infuse both temporal and spatial configurations for video events. Results show that the proposed method gives good performance on several publicly available datasets in terms of robustness and recognition rate.