Human Action Recognition and Localization in Video Using Structured Learning of Local Space-Time Features

Authors:
Tuan Hue Thi;Jian Zhang;Li Cheng;Li Wang;Shinichi Satoh
Affiliations:
-;-;-;-;-
Venue:
AVSS '10 Proceedings of the 2010 7th IEEE International Conference on Advanced Video and Signal Based Surveillance
Year:
2010

Citing 0
Cited 4

Are current monocular computer vision systems for human action recognition suitable for visual surveillance applications?

ISVC'11 Proceedings of the 7th international conference on Advances in visual computing - Volume Part II
Integrating local action elements for action analysis

Computer Vision and Image Understanding
Editors Choice Article: Structured learning of local features for human action classification and localization

Image and Vision Computing
A survey of video datasets for human action and activity recognition

Computer Vision and Image Understanding

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a unified framework for human actionclassification and localization in video using structuredlearning of local space-time features. Each human actionclass is represented by a set of its own compact set of localpatches. In our approach, we first use a discriminativehierarchical Bayesian classifier to select those space-timeinterest points that are constructive for each particular action.Those concise local features are then passed to a SupportVector Machine with Principal Component Analysisprojection for the classification task. Meanwhile, the actionlocalization is done using Dynamic Conditional RandomFields developed to incorporate the spatial and temporalstructure constraints of superpixels extracted aroundthose features. Each superpixel in the video is defined by theshape and motion information of its corresponding featureregion. Compelling results obtained from experiments onKTH [22], Weizmann [1], HOHA [13] and TRECVid [23]datasets have proven the efficiency and robustness of ourframework for the task of human action recognition and localizationin video.