Mining Layered Grammar Rules for Action Recognition

Authors:
Liang Wang;Yizhou Wang;Wen Gao
Affiliations:
School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China and Nat'l Engineering Lab for Video Technology, Peking University, Beijing, China;Nat'l Engineering Lab for Video Technology and Key Lab. of Machine Perception (MoE), School of Electronics Engineering and Computer Science, Peking University, Beijing, China;Nat'l Engineering Lab for Video Technology and Key Lab. of Machine Perception (MoE), School of Electronics Engineering and Computer Science, Peking University, Beijing, China
Venue:
International Journal of Computer Vision
Year:
2011

Citing 19
Cited 2

Segmentation of range images as the search for geometric parametric models

International Journal of Computer Vision
Efficient mining of emerging patterns: discovering trends and differences

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Recognition of Visual Activities and Interactions by Stochastic Parsing

IEEE Transactions on Pattern Analysis and Machine Intelligence
A perspective view and survey of meta-learning

Artificial Intelligence Review
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
CAEP: Classification by Aggregating Emerging Patterns

DS '99 Proceedings of the Second International Conference on Discovery Science
Space-time Interest Points

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Distinctive Image Features from Scale-Invariant Keypoints

International Journal of Computer Vision
Recognizing Human Actions: A Local SVM Approach

ICPR '04 Proceedings of the Pattern Recognition, 17th International Conference on (ICPR'04) Volume 3 - Volume 03
Histograms of Oriented Gradients for Human Detection

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 1 - Volume 01
Efficient Visual Event Detection Using Volumetric Features

ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1 - Volume 01
Using Emerging Patterns to Construct Weighted Decision Trees

IEEE Transactions on Knowledge and Data Engineering
Behavior recognition via sparse spatio-temporal features

ICCCN '05 Proceedings of the 14th International Conference on Computer Communications and Networks
Robust Object Detection with Interleaved Categorization and Segmentation

International Journal of Computer Vision
Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words

International Journal of Computer Vision
Semantic event representation and recognition using syntactic attribute graph grammar

Pattern Recognition Letters
Scale Invariant Action Recognition Using Compound Features Mined from Dense Spatio-temporal Corners

ECCV '08 Proceedings of the 10th European Conference on Computer Vision: Part I
Semantic Representation and Recognition of Continued and Recursive Human Activities

International Journal of Computer Vision
A Thousand Words in a Scene

IEEE Transactions on Pattern Analysis and Machine Intelligence

Explaining Activities as Consistent Groups of Events

International Journal of Computer Vision
Learning discriminative features for fast frame-based action recognition

Pattern Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose a layered-grammar model to represent actions. Using this model, an action is represented by a set of grammar rules. The bottom layer of an action instance's parse tree contains action primitives such as spatiotemporal (ST) interest points. At each layer above, we iteratively mine grammar rules and "super rules" that account for the high-order compositional feature structures. The grammar rules are categorized into three classes according to three different ST-relations of their action components, namely the strong relation, weak relation and stochastic relation. These ST-relations characterize different action styles (degree of stiffness), and they are pursued in terms of grammar rules for the purpose of action recognition. By adopting the Emerging Pattern (EP) mining algorithm for relation pursuit, the learned production rules are statistically significant and discriminative. Using the learned rules, the parse tree of an action video is constructed by combining a bottom-up rule detection step and a top-down ambiguous rule pruning step. An action instance is recognized based on the discriminative configurations generated by the production rules of its parse tree. Experiments confirm that by incorporating the high-order feature statistics, the proposed method largely improves the recognition performance over the bag-of-words models.