Evaluation of low-level features and their combinations for complex event detection in open source videos

Authors:
Jingen Liu
Affiliations:
SRI International Sarnoff, Princeton, NJ 08540
Venue:
CVPR '12 Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Year:
2012

Citing 0
Cited 1

E-LAMP: integration of innovative ideas for multimedia event detection

Machine Vision and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Low-level appearance as well as spatio-temporal features, appropriately quantized and aggregated into Bag-of-Words (BoW) descriptors, have been shown to be effective in many detection and recognition tasks. However, their effcacy for complex event recognition in unconstrained videos have not been systematically evaluated. In this paper, we use the NIST TRECVID Multimedia Event Detection (MED11 [1]) open source dataset, containing annotated data for 15 high-level events, as the standardized test bed for evaluating the low-level features. This dataset contains a large number of user-generated video clips. We consider 7 different low-level features, both static and dynamic, using BoW descriptors within an SVM approach for event detection. We present performance results on the 15 MED11 events for each of the features as well as their combinations using a number of early and late fusion strategies and discuss their strengths and limitations.