Recognizing complex events using large margin joint low-level event model

Authors:
Hamid Izadinia;Mubarak Shah
Affiliations:
Computer Vision Lab, University of Central Florida, Orlando, Florida;Computer Vision Lab, University of Central Florida, Orlando, Florida
Venue:
ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part IV
Year:
2012

Citing 16
Cited 2

Fundamentals of speech recognition

Fundamentals of speech recognition
Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope

International Journal of Computer Vision
Distinctive Image Features from Scale-Invariant Keypoints

International Journal of Computer Vision
Recognizing Human Actions: A Local SVM Approach

ICPR '04 Proceedings of the Pattern Recognition, 17th International Conference on (ICPR'04) Volume 3 - Volume 03
On Space-Time Interest Points

International Journal of Computer Vision
Large margin training for hidden Markov models with partially observed states

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
The Pascal Visual Object Classes (VOC) Challenge

International Journal of Computer Vision
Modeling temporal structure of decomposable motion segments for activity classification

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part II
A discriminative latent model of object classes and attributes

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part V
Actom sequence models for efficient action detection

CVPR '11 Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition
Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis

CVPR '11 Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition
Image ranking and retrieval based on multi-attribute queries

CVPR '11 Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition
Representations of Keypoint-Based Semantic Concept Detection: A Comprehensive Study

IEEE Transactions on Multimedia
Semantic Model Vectors for Complex Video Event Recognition

IEEE Transactions on Multimedia
Learning latent temporal structure for complex event detection

CVPR '12 Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
HMDB: A large video database for human motion recognition

ICCV '11 Proceedings of the 2011 International Conference on Computer Vision

Object coding on the semantic graph for scene classification

Proceedings of the 21st ACM international conference on Multimedia
Combining multiple sensors for event recognition of older people

Proceedings of the 1st ACM international workshop on Multimedia indexing and information retrieval for healthcare

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we address the challenging problem of complex event recognition by using low-level events. In this problem, each complex event is captured by a long video in which several low-level events happen. The dataset contains several videos and due to the large number of videos and complexity of the events, the available annotation for the low-level events is very noisy which makes the detection task even more challenging. To tackle these problems we model the joint relationship between the low-level events in a graph where we consider a node for each low-level event and whenever there is a correlation between two low-level events the graph has an edge between the corresponding nodes. In addition, for decreasing the effect of weak and/or irrelevant low-level event detectors we consider the presence/absence of low-level events as hidden variables and learn a discriminative model by using latent SVM formulation. Using our learned model for the complex event recognition, we can also apply it for improving the detection of the low-level events in video clips which enables us to discover a conceptual description of the video. Thus our model can do complex event recognition and explain a video in terms of low-level events in a single framework. We have evaluated our proposed method over the most challenging multimedia event detection dataset. The experimental results reveals that the proposed method performs well compared to the baseline method. Further, our results of conceptual description of video shows that our model is learned quite well to handle the noisy annotation and surpass the low-level event detectors which are directly trained on the raw features.