Human action recognition by learning bases of action attributes and parts

Authors:
Bangpeng Yao;Xiaoye Jiang;Aditya Khosla;Andy Lai Lin;Leonidas Guibas;Li Fei-Fei
Affiliations:
Computer Science Department, Stanford University, CA, USA;Institute for Computational & Mathematical Engineering, Stanford University, CA, USA;Computer Science Department, Stanford University, CA, USA;Electrical Engineering Department, Stanford University, CA, USA;Computer Science Department, Stanford University, CA, USA;Computer Science Department, Stanford University, CA, USA
Venue:
ICCV '11 Proceedings of the 2011 International Conference on Computer Vision
Year:
2011

Citing 0
Cited 12

Script data for attribute-based recognition of composite activities

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part I
Action recognition with exemplar based 2.5d graph matching

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part IV
On recognizing actions in still images via multiple features

ECCV'12 Proceedings of the 12th international conference on Computer Vision - Volume Part III
Action segmentation in dance videos

PCM'12 Proceedings of the 13th Pacific-Rim conference on Advances in Multimedia Information Processing
Dynamic scene understanding by improved sparse topical coding

Pattern Recognition
Exploiting language models to recognize unseen actions

Proceedings of the 3rd ACM conference on International conference on multimedia retrieval
Relative forest for attribute prediction

ACCV'12 Proceedings of the 11th Asian conference on Computer Vision - Volume Part I
Towards zero-shot learning for human activity recognition using semantic attribute sequence model

Proceedings of the 2013 ACM international joint conference on Pervasive and ubiquitous computing
Exploring STIP-based models for recognizing human interactions in TV videos

Pattern Recognition Letters
Violent scene detection using mid-level feature

Proceedings of the Fourth Symposium on Information and Communication Technology
Human interaction categorization by using audio-visual cues

Machine Vision and Applications
Coloring Action Recognition in Still Images

International Journal of Computer Vision

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this work, we propose to use attributes and parts for recognizing human actions in still images. We define action attributes as the verbs that describe the properties of human actions, while the parts of actions are objects and poselets that are closely related to the actions. We jointly model the attributes and parts by learning a set of sparse bases that are shown to carry much semantic meaning. Then, the attributes and parts of an action image can be reconstructed from sparse coefficients with respect to the learned bases. This dual sparsity provides theoretical guarantee of our bases learning and feature reconstruction approach. On the PASCAL action dataset and a new "Stanford 40 Actions" dataset, we show that our method extracts meaningful high-order interactions between attributes and parts in human actions while achieving state-of-the-art classification performance.