Semantic pooling for complex event detection

Authors:
Qian Yu;Jingen Liu;Hui Cheng;Ajay Divakaran;Harpreet Sawhney
Affiliations:
SRI International, Princeton, NJ, USA;SRI International, Princeton, NJ, USA;SRI International, Princeton, NJ, USA;SRI International, Princeton, NJ, USA;SRI International, Princeton, NJ, USA
Venue:
Proceedings of the 21st ACM international conference on Multimedia
Year:
2013

Citing 7
Cited 0

Distinctive Image Features from Scale-Invariant Keypoints

International Journal of Computer Vision
On Space-Time Interest Points

International Journal of Computer Vision
Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Semantic Model Vectors for Complex Video Event Recognition

IEEE Transactions on Multimedia
Multimodal feature fusion for robust event detection in web videos

CVPR '12 Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Action recognition by exploring data distribution and feature correlation

CVPR '12 Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Scene aligned pooling for complex video recognition

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part II

Quantified Score

Hi-index	0.00

Visualization

Abstract

Complex event detection is very challenging in open source such as You-Tube videos, which usually comprise very diverse visual contents involving various object, scene and action concepts. Not all of them, however, are relevant to the event. In other words, a video may contain a lot of "junk" information which is harmful for recognition. Hence, we propose a semantic pooling approach to tackle this issue. Unlike the conventional pooling over the entire video or specific spatial regions of a video, we employ a discriminative approach to acquire abstract semantic "regions" for pooling. For this purpose, we first associate low-level visual words with semantic concepts via their co-occurrence relationship. We then pool the low-level features separately according to their semantic information. The proposed semantic pooling strategy also provides a new mechanism for incorporating semantic concepts for low-level feature based event recognition. We evaluate our approach on TRECVID MED [1] dataset and the results show that semantic pooling consistently improves the performance compared with conventional pooling strategies.