Spatio-temporal fisher vector coding for surveillance event detection

Authors:
Qiang Chen;Yang Cai;Lisa Brown;Ankur Datta;Quanfu Fan;Rogerio Feris;Shuicheng Yan;Alex Hauptmann;Sharath Pankanti
Affiliations:
National University of Singapore, Singapore, Singapore;Carnegie Mellon University, Pittsburgh, USA;IBM Research, New York, USA;IBM Research, New York, USA;IBM Research, New York, USA;IBM Research, New York, USA;National University of Singapore, Singapore, Singapore;Carnegie Mellon University, Pittsburgh, USA;IBM Research, New York, USA
Venue:
Proceedings of the 21st ACM international conference on Multimedia
Year:
2013

Citing 12
Cited 0

A Tutorial on Support Vector Machines for Pattern Recognition

Data Mining and Knowledge Discovery
A Generalized Representer Theorem

COLT '01/EuroCOLT '01 Proceedings of the 14th Annual Conference on Computational Learning Theory and and 5th European Conference on Computational Learning Theory
Space-time Interest Points

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Recognizing Human Actions: A Local SVM Approach

ICPR '04 Proceedings of the Pattern Recognition, 17th International Conference on (ICPR'04) Volume 3 - Volume 03
A Bayesian Hierarchical Model for Learning Natural Scene Categories

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 2 - Volume 02
Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Evaluation campaigns and TRECVid

MIR '06 Proceedings of the 8th ACM international workshop on Multimedia information retrieval
A dual coordinate descent method for large-scale linear SVM

Proceedings of the 25th international conference on Machine learning
An Efficient Dense and Scale-Invariant Spatio-Temporal Interest Point Detector

ECCV '08 Proceedings of the 10th European Conference on Computer Vision: Part II
The Pascal Visual Object Classes (VOC) Challenge

International Journal of Computer Vision
Improving the fisher kernel for large-scale image classification

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part IV
Image classification using super-vector coding of local image descriptors

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part V

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a generic event detection system evaluated in the Surveillance Event Detection (SED) task of TRECVID 2012. We investigate a statistical approach with spatio-temporal features applied to seven event classes, which were defined by the SED task. This approach is based on local spatio-temporal descriptors, called MoSIFT and generated by pair-wise video frames. A Gaussian Mixture Model(GMM) is learned to model the distribution of the low level features. Then for each sliding window, the Fisher vector encoding [improvedFV] is used to generate the sample representation. The model is learnt using a Linear SVM for each event. The main novelty of our system is the introduction of Fisher vector encoding into video event detection. Fisher vector encoding has demonstrated great success in image classification. The key idea is to model the low level visual features as a Gaussian Mixture Model and to generate an intermediate vector representation for bag of features. FV encoding uses higher order statistics in place of histograms in the standard BoW. FV has several good properties: (a) it can naturally separate the video specific information from the noisy local features and (b) we can use a linear model for this representation. We build an efficient implementation for FV encoding which can attain a 10 times speed-up over real-time. We also take advantage of non-trivial object localization techniques to feed into the video event detection, e.g. multi-scale detection and non-maximum suppression. This approach outperformed the results of all other teams submissions in TRECVID SED 2012 on four of the seven event types.