Neural Computation
Activity Recognition and Abnormality Detection with the Switching Hidden Semi-Markov Model
CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 1 - Volume 01
International Journal of Computer Vision
A 3-dimensional sift descriptor and its application to action recognition
Proceedings of the 15th international conference on Multimedia
A stochastic grammar of images
Foundations and Trends® in Computer Graphics and Vision
SIFT-Bag kernel for video event analysis
MM '08 Proceedings of the 16th ACM international conference on Multimedia
Real-time human action recognition by luminance field trajectory analysis
MM '08 Proceedings of the 16th ACM international conference on Multimedia
Semantic event representation and recognition using syntactic attribute graph grammar
Pattern Recognition Letters
MM '09 Proceedings of the 17th ACM international conference on Multimedia
Object Detection with Discriminatively Trained Part-Based Models
IEEE Transactions on Pattern Analysis and Machine Intelligence
Object, scene and actions: combining multiple features for human action recognition
ECCV'10 Proceedings of the 11th European conference on Computer vision: Part I
Modeling the temporal extent of actions
ECCV'10 Proceedings of the 11th European conference on Computer vision: Part I
Modeling temporal structure of decomposable motion segments for activity classification
ECCV'10 Proceedings of the 11th European conference on Computer vision: Part II
Hidden Part Models for Human Action Recognition: Probabilistic versus Max Margin
IEEE Transactions on Pattern Analysis and Machine Intelligence
Discriminative Video Pattern Search for Efficient Action Detection
IEEE Transactions on Pattern Analysis and Machine Intelligence
Real-time human action search using random forest based hough voting
MM '11 Proceedings of the 19th ACM international conference on Multimedia
Learning context for collective activity recognition
CVPR '11 Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition
Action recognition by dense trajectories
CVPR '11 Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition
A Constrained Probabilistic Petri Net Framework for Human Activity Detection in Video
IEEE Transactions on Multimedia
Exploring probabilistic localized video representation for human action recognition
Multimedia Tools and Applications
Learning contour-fragment-based shape model with And-Or tree representation
CVPR '12 Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Action bank: A high-level representation of activity in video
CVPR '12 Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Discovering discriminative action parts from mid-level video representations
CVPR '12 Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Learning latent temporal structure for complex event detection
CVPR '12 Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Action recognition by exploring data distribution and feature correlation
CVPR '12 Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Multi-view latent variable discriminative models for action recognition
CVPR '12 Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Unsupervised learning of event AND-OR grammar and semantics from video
ICCV '11 Proceedings of the 2011 International Conference on Computer Vision
Learning spatiotemporal graphs of human activities
ICCV '11 Proceedings of the 2011 International Conference on Computer Vision
Leveraging high-level and low-level features for multimedia event detection
Proceedings of the 20th ACM international conference on Multimedia
Knowledge adaptation for ad hoc multimedia event detection with few exemplars
Proceedings of the 20th ACM international conference on Multimedia
Spatio-Temporal phrases for activity recognition
ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part III
Trajectory-Based modeling of human actions with motion reference points
ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part V
Cost-Sensitive top-down/bottom-up inference for multiscale activity recognition
ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part IV
Hi-index | 0.00 |
Action recognition is an important problem in multimedia understanding. This paper addresses this problem by building an expressive compositional action model. We model one action instance in the video with an ensemble of spatio-temporal compositions: a number of discrete temporal anchor frames, each of which is further decomposed to a layout of deformable parts. In this way, our model can identify a Spatio-Temporal And-Or Graph (STAOG) to represent the latent structure of actions \emph{e.g.} triple jumping, swinging and high jumping. The STAOG model comprises four layers: (i) a batch of leaf-nodes in bottom for detecting various action parts within video patches; (ii) the or-nodes over bottom, i.e. switch variables to activate their children leaf-nodes for structural variability; (iii) the and-nodes within an anchor frame for verifying spatial composition; and (iv) the root-node at top for aggregating scores over temporal anchor frames. Moreover, the contextual interactions are defined between leaf-nodes in both spatial and temporal domains. For model training, we develop a novel weakly supervised learning algorithm which iteratively determines the structural configuration (e.g. the production of leaf-nodes associated with the or-nodes) along with the optimization of multi-layer parameters. By fully exploiting spatio-temporal compositions and interactions, our approach handles well large intra-class action variance (\emph{e.g.} different views, individual appearances, spatio-temporal structures). The experimental results on the challenging databases demonstrate superior performance of our approach over other methods.