Learning discriminative features for fast frame-based action recognition

  • Authors:
  • Liang Wang;Yizhou Wang;Tingting Jiang;Debin Zhao;Wen Gao

  • Affiliations:
  • National Engineering Lab for Video Technology & Key Laboratory of Machine Perception (MoE), School of EECS, Peking University, Beijing, China and School of Computer Science and Technology, Harbin ...;National Engineering Lab for Video Technology & Key Laboratory of Machine Perception (MoE), School of EECS, Peking University, Beijing, China;National Engineering Lab for Video Technology & Key Laboratory of Machine Perception (MoE), School of EECS, Peking University, Beijing, China;School of Computer Science and Technology, Harbin Institute of Technology, Heilongjiang Province, China;National Engineering Lab for Video Technology & Key Laboratory of Machine Perception (MoE), School of EECS, Peking University, Beijing, China

  • Venue:
  • Pattern Recognition
  • Year:
  • 2013

Quantified Score

Hi-index 0.01

Visualization

Abstract

In this paper we present an instant action recognition method, which is able to recognize an action in real-time from only two continuous video frames. For the sake of instantaneity, we employ two types of computationally efficient but perceptually important features - optical flow and edges - to capture motion and shape characteristics of actions. It is known that the two types of features can be unreliable or ambiguous due to noise and degradation of video quality. In order to endow them with strong discriminative power, we pursue combined features, of which the joint distributions are different in-between action classes. As the low-level visual features are usually densely distributed in video frames, to reduce computational expense and induce a compact structural representation, we propose to first group the learned discriminative joint features into feature groups according to their correlation, then adapt the efficient boosting method as the action recognition engine which take the grouped features as input. Experimental results show that the combination of the two types of features achieves superior performance in differentiating actions than that of using each single type of features alone. The whole model is computationally efficient, and the action recognition accuracy is comparable to the state-of-the-art approaches.