Graphical framework for action recognition using temporally dense STIPs

  • Authors:
  • Pradeep Natarajan;Prithviraj Banerjee;Furqan M. Khan;Ramakant Nevatia

  • Affiliations:
  • BBN Technologies, Cambridge, MA;Institute for Robotics and Intelligent Systems, University of Southern California, Los Angeles, CA;Institute for Robotics and Intelligent Systems, University of Southern California, Los Angeles, CA;Institute for Robotics and Intelligent Systems, University of Southern California, Los Angeles, CA

  • Venue:
  • WMVC'09 Proceedings of the 2009 international conference on Motion and video computing
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Graphical models have been shown to provide a natural framework for modelling high level action transition constraints, and to simultaneously segment and recognize a sequence of actions. More recently, Spatio-temporal Interest Points (STIPs) have been proposed as suitable features for action detection. These interest points are typically mapped to a set of codewords, and actions are detected by accumulating the codeword weights or by learning suitable classifiers. Existing methods for interest point detection provide a sparse representation of actions and require a costly exhaustive search over the entire spatio-temporal volume for action classification. Our contribution here is two-fold - first, we combine the interest point models of actions with pedestrian detection and tracking using a Conditional Random Field (CRF); second, we extend existing interest point detectors to provide a dense action representation while minimizing spurious detections. The larger number of interest points and the high-level reasoning provided by the CRF allows us to automatically recognize action sequences from an unsegmented stream, at real time speed. We demonstrate our approach by showing results comparable to state-of-the-art for action classification on the standard KTH-action set, and also on more challenging cluttered videos.