Mining Layered Grammar Rules for Action Recognition

  • Authors:
  • Liang Wang;Yizhou Wang;Wen Gao

  • Affiliations:
  • School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China and Nat'l Engineering Lab for Video Technology, Peking University, Beijing, China;Nat'l Engineering Lab for Video Technology and Key Lab. of Machine Perception (MoE), School of Electronics Engineering and Computer Science, Peking University, Beijing, China;Nat'l Engineering Lab for Video Technology and Key Lab. of Machine Perception (MoE), School of Electronics Engineering and Computer Science, Peking University, Beijing, China

  • Venue:
  • International Journal of Computer Vision
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

We propose a layered-grammar model to represent actions. Using this model, an action is represented by a set of grammar rules. The bottom layer of an action instance's parse tree contains action primitives such as spatiotemporal (ST) interest points. At each layer above, we iteratively mine grammar rules and "super rules" that account for the high-order compositional feature structures. The grammar rules are categorized into three classes according to three different ST-relations of their action components, namely the strong relation, weak relation and stochastic relation. These ST-relations characterize different action styles (degree of stiffness), and they are pursued in terms of grammar rules for the purpose of action recognition. By adopting the Emerging Pattern (EP) mining algorithm for relation pursuit, the learned production rules are statistically significant and discriminative. Using the learned rules, the parse tree of an action video is constructed by combining a bottom-up rule detection step and a top-down ambiguous rule pruning step. An action instance is recognized based on the discriminative configurations generated by the production rules of its parse tree. Experiments confirm that by incorporating the high-order feature statistics, the proposed method largely improves the recognition performance over the bag-of-words models.