Human Action Recognition and Localization in Video Using Structured Learning of Local Space-Time Features

  • Authors:
  • Tuan Hue Thi;Jian Zhang;Li Cheng;Li Wang;Shinichi Satoh

  • Affiliations:
  • -;-;-;-;-

  • Venue:
  • AVSS '10 Proceedings of the 2010 7th IEEE International Conference on Advanced Video and Signal Based Surveillance
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents a unified framework for human actionclassification and localization in video using structuredlearning of local space-time features. Each human actionclass is represented by a set of its own compact set of localpatches. In our approach, we first use a discriminativehierarchical Bayesian classifier to select those space-timeinterest points that are constructive for each particular action.Those concise local features are then passed to a SupportVector Machine with Principal Component Analysisprojection for the classification task. Meanwhile, the actionlocalization is done using Dynamic Conditional RandomFields developed to incorporate the spatial and temporalstructure constraints of superpixels extracted aroundthose features. Each superpixel in the video is defined by theshape and motion information of its corresponding featureregion. Compelling results obtained from experiments onKTH [22], Weizmann [1], HOHA [13] and TRECVid [23]datasets have proven the efficiency and robustness of ourframework for the task of human action recognition and localizationin video.