A tree-based approach to integrated action localization, recognition and segmentation

Authors:
Zhuolin Jiang;Zhe Lin;Larry S. Davis
Affiliations:
University of Maryland, College Park, MD;Adobe Systems Incorporated, San Jose, CA;University of Maryland, College Park, MD
Venue:
ECCV'10 Proceedings of the 11th European conference on Trends and Topics in Computer Vision - Volume Part I
Year:
2010

Citing 13
Cited 1

The Recognition of Human Movement Using Temporal Templates

IEEE Transactions on Pattern Analysis and Machine Intelligence
Recognizing Action at a Distance

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Recognizing Human Actions: A Local SVM Approach

ICPR '04 Proceedings of the Pattern Recognition, 17th International Conference on (ICPR'04) Volume 3 - Volume 03
Histograms of Oriented Gradients for Human Detection

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 1 - Volume 01
Exploring the Space of a Human Action

ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1 - Volume 01
Efficient Visual Event Detection Using Volumetric Features

ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1 - Volume 01
Actions as Space-Time Shapes

ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision - Volume 2
The Function Space of an Activity

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 1
Scalable Recognition with a Vocabulary Tree

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Sharing Visual Features for Multiclass and Multiview Object Detection

IEEE Transactions on Pattern Analysis and Machine Intelligence
Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words

International Journal of Computer Vision
Learning dynamics for exemplar-based gesture recognition

CVPR'03 Proceedings of the 2003 IEEE computer society conference on Computer vision and pattern recognition
Expandable Data-Driven Graphical Modeling of Human Actions Based on Salient Postures

IEEE Transactions on Circuits and Systems for Video Technology

Selective spatio-temporal interest points

Computer Vision and Image Understanding

Quantified Score

Hi-index	0.00

Visualization

Abstract

A tree-based approach to integrated action segmentation, localization and recognition is proposed. An action is represented as a sequence of joint hog-flow descriptors extracted independently from each frame. During training, a set of action prototypes is first learned based on a k-means clustering, and then a binary tree model is constructed from the set of action prototypes based on hierarchical k-means clustering. Each tree node is characterized by a shape-motion descriptor and a rejection threshold, and an action segmentation mask is defined for leaf nodes (corresponding to a prototype). During testing, an action is localized by mapping each test frame to a nearest neighbor prototype using a fast matching method to search the learned tree, followed by global filtering refinement. An action is recognized by maximizing the sum of the joint probabilities of the action category and action prototype over test frames. Our approach does not explicitly rely on human tracking and background subtraction, and enables action localization and recognition in realistic and challenging conditions (such as crowded backgrounds). Experimental results show that our approach can achieve recognition rates of 100% on the CMU action dataset and 100% on the Weizmann dataset.