Video Event Classification Using Bag of Words and String Kernels

  • Authors:
  • Lamberto Ballan;Marco Bertini;Alberto Bimbo;Giuseppe Serra

  • Affiliations:
  • Media Integration and Communication Center, University of Florence, Italy;Media Integration and Communication Center, University of Florence, Italy;Media Integration and Communication Center, University of Florence, Italy;Media Integration and Communication Center, University of Florence, Italy

  • Venue:
  • ICIAP '09 Proceedings of the 15th International Conference on Image Analysis and Processing
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

The recognition of events in videos is a relevant and challenging task of automatic semantic video analysis. At present one of the most successful frameworks, used for object recognition tasks, is the bag-of-words (BoW) approach. However this approach does not model the temporal information of the video stream. In this paper we present a method to introduce temporal information within the BoW approach. Events are modeled as a sequence composed of histograms of visual features, computed from each frame using the traditional BoW model. The sequences are treated as strings where each histogram is considered as a character. Event classification of these sequences of variable size, depending on the length of the video clip, are performed using SVM classifiers with a string kernel that uses the Needlemann-Wunsch edit distance. Experimental results, performed on two datasets, soccer video and TRECVID 2005, demonstrate the validity of the proposed approach.