Detecting video events based on action recognition in complex scenes using spatio-temporal descriptor

Authors:
Guangyu Zhu;Ming Yang;Kai Yu;Wei Xu;Yihong Gong
Affiliations:
Institute of Automation, CAS, Beijing, China;NEC Laboratories America, Cupertino, CA, USA;NEC Laboratories America, Cupertino, CA, USA;NEC Laboratories America, Cupertino, CA, USA;NEC Laboratories America, Cupertino, CA, USA
Venue:
MM '09 Proceedings of the 17th ACM international conference on Multimedia
Year:
2009

Citing 20
Cited 8

The nature of statistical learning theory

The nature of statistical learning theory
Event Detection and Analysis from Video Streams

IEEE Transactions on Pattern Analysis and Machine Intelligence
Recognizing Action at a Distance

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Structure analysis of soccer video with domain knowledge and hidden Markov models

Pattern Recognition Letters - Video computing
Distinctive Image Features from Scale-Invariant Keypoints

International Journal of Computer Vision
Recognizing Human Actions: A Local SVM Approach

ICPR '04 Proceedings of the Pattern Recognition, 17th International Conference on (ICPR'04) Volume 3 - Volume 03
Efficient Visual Event Detection Using Volumetric Features

ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1 - Volume 01
Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Live sports event detection based on broadcast video and web-casting text

MULTIMEDIA '06 Proceedings of the 14th annual ACM international conference on Multimedia
Player action recognition in broadcast tennis video with applications to semantic analysis of sports game

MULTIMEDIA '06 Proceedings of the 14th annual ACM international conference on Multimedia
Towards optimal bag-of-features for object categorization and semantic video retrieval

Proceedings of the 6th ACM international conference on Image and video retrieval
Video event detection using motion relativity and visual relatedness

MM '08 Proceedings of the 16th ACM international conference on Multimedia
Real-time human action recognition by luminance field trajectory analysis

MM '08 Proceedings of the 16th ACM international conference on Multimedia
Personalized abstraction of broadcasted American football video by highlight selection

IEEE Transactions on Multimedia
Multimedia event-based video indexing using time intervals

IEEE Transactions on Multimedia
Video Semantic Event/Concept Detection Using a Subspace-Based Multimedia Data Mining Framework

IEEE Transactions on Multimedia
A survey on visual surveillance of object motion and behaviors

IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews
Event detection in field sports video using audio-visual features and a support vector Machine

IEEE Transactions on Circuits and Systems for Video Technology
Modality Mixture Projections for Semantic Video Event Detection

IEEE Transactions on Circuits and Systems for Video Technology
Machine Recognition of Human Activities: A Survey

IEEE Transactions on Circuits and Systems for Video Technology

Action recognition with appearance-motion features and fast search trees

Computer Vision and Image Understanding
Boosted multi-class semi-supervised learning for human action recognition

Pattern Recognition
Finding the game flow from sports video

J-MRE '11 Proceedings of the 2011 joint ACM workshop on Modeling and representing events
Predicting human activities using spatio-temporal structure of interest points

Proceedings of the 20th ACM international conference on Multimedia
Recognizing actions using depth motion maps-based histograms of oriented gradients

Proceedings of the 20th ACM international conference on Multimedia
Unified framework for human behaviour recognition: An approach using 3D Zernike moments

Neurocomputing
Learning latent spatio-temporal compositional model for human action recognition

Proceedings of the 21st ACM international conference on Multimedia
Discriminative two-level feature selection for realistic human action recognition

Journal of Visual Communication and Image Representation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Event detection plays an essential role in video content analysis and remains a challenging open problem. In particular, the study on detecting human-related video events in complex scenes with both a crowd of people and dynamic motion is still limited. In this paper, we investigate detecting video events that involve elementary human actions, e.g. making cellphone call, putting an object down, and pointing to something, in complex scenes using a novel spatio-temporal descriptor based approach. A new spatio-temporal descriptor, which temporally integrates the statistics of a set of response maps of low-level features, e.g. image gradients and optical flows, in a space-time cube, is proposed to capture the characteristics of actions in terms of their appearance and motion patterns. Based on this kind of descriptors, the bag-of-words method is utilized to describe a human figure as a concise feature vector. Then, these features are employed to train SVM classifiers at multiple spatial pyramid levels to distinguish different actions. Finally, a Gaussian kernel based temporal filtering is conducted to segment the sequences of events from a video stream taking account of the temporal consistency of actions. The proposed approach is capable of tolerating spatial layout variations and local deformations of human actions due to diverse view angles and rough human figure alignment in complex scenes. Extensive experiments on the 50-hour video dataset of TRECVid 2008 event detection task demonstrate that our approach outperforms the well-known SIFT descriptor based methods and effectively detects video events in challenging real-world conditions.