Large-scale web video event classification by use of Fisher Vectors

  • Authors:
  • Chen Sun;Ram Nevatia

  • Affiliations:
  • University of Southern California, Institute for Robotics and Intelligent Systems, Los Angeles, CA 90089, USA;University of Southern California, Institute for Robotics and Intelligent Systems, Los Angeles, CA 90089, USA

  • Venue:
  • WACV '13 Proceedings of the 2013 IEEE Workshop on Applications of Computer Vision (WACV)
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Event recognition has been an important topic in computer vision research due to its many applications. However, most of the work has focused on videos taken from a fixed camera, known environments and basic events. Here, we focus on classification of unconstrained, web videos into much higher level activities. We follow the approach of constructing fixed length feature vectors from local feature descriptors for classification using an SVM. Our key contribution is the study of the utility of Fisher Vector representation in improving results compared to the conventional Bag-of-Words (BoW) approach. Such coding has shown to be useful for static image classification in the past but not applied to video categorization. We perform tests on the challenging NIST TRECVID Multimedia Event Detection (MED) dataset, which has thousand hours of unconstrained user generated videos; our approach achieves as much as 35% improvement over the BoW baseline. We also offer an analysis of possible causes of such improvements.