Editors Choice Article: Structured learning of local features for human action classification and localization

  • Authors:
  • Tuan Hue Thi;Li Cheng;Jian Zhang;Li Wang;Shinichi Satoh

  • Affiliations:
  • National ICT of Australia, Australia and School of Computer Science, University of New South Wales, Australia;Bioinformatics Institute, A*STAR, Singapore;National ICT of Australia, Australia and Faculty of Engineering and Information Technology, University of Technology, Sydney, Australia;Nanjing Forest Univeristy, China;National Institute of Informatics, Japan

  • Venue:
  • Image and Vision Computing
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Human action recognition is a promising yet non-trivial computer vision field with many potential applications. Current advances in bag-of-feature approaches have brought significant insights into recognizing human actions within complex context. It is, however, a common practice in literature to consider action as merely an orderless set of local salient features. This representation has been shown to be oversimplified, which inherently limits traditional approaches from robust deployment in real-life scenarios. In this work, we propose and show that, by taking into account global configuration of local features, we can greatly improve recognition performance. We first introduce a novel feature selection process called Sparse Hierarchical Bayes Filter to select only the most contributive features of each action type based on neighboring structure constraints. We then present the application of structured learning in human action analysis. That is, by representing human action as a complex set of local features, we can incorporate different spatial and temporal feature constraints into the learning tasks of human action classification and localization. In particular, we tackle the problem of action localization in video using structured learning with two alternatives: one is Dynamic Conditional Random Field from probabilistic perspective; the other is Structural Support Vector Machine from max-margin point of view. We evaluate our modular classification-localization framework on various testbeds, in which our proposed framework is proven to be highly effective and robust compared against bag-of-feature methods.