Action recognition using context and appearance distribution features

Authors:
Xinxiao Wu; Dong Xu; Lixin Duan; Jiebo Luo
Affiliations:
Sch. of Comput. Eng., Nanyang Technol. Univ., Singapore, Singapore;Sch. of Comput. Eng., Nanyang Technol. Univ., Singapore, Singapore;Sch. of Comput. Eng., Nanyang Technol. Univ., Singapore, Singapore;Kodak Res. Labs., Eastman Kodak Co., Rochester, NY, USA
Venue:
CVPR '11 Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition
Year:
2011

Citing 0
Cited 10

Human action recognition and retrieval using sole depth information

Proceedings of the 20th ACM international conference on Multimedia
Transfer discriminant-analysis of canonical correlations for view-transfer action recognition

PCM'12 Proceedings of the 13th Pacific-Rim conference on Advances in Multimedia Information Processing
Human action recognition based on boosted feature selection and naive Bayes nearest-neighbor classification

Signal Processing
Latent semantic learning with structured sparse representation for human action recognition

Pattern Recognition
Boosted key-frame selection and correlated pyramidal motion-feature representation for human action recognition

Pattern Recognition
Combining appearance and structural features for human action recognition

Neurocomputing
Kernel analysis on Grassmann manifolds for action recognition

Pattern Recognition Letters
Silhouette-based human action recognition using sequences of key poses

Pattern Recognition Letters
Matching mixtures of curves for human action recognition

Computer Vision and Image Understanding
Selection of negative samples and two-stage combination of multiple features for action detection in thousands of videos

Machine Vision and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

We first propose a new spatio-temporal context distribution feature of interest points for human action recognition. Each action video is expressed as a set of relative XYT coordinates between pairwise interest points in a local region. We learn a global GMM (referred to as Universal Background Model, UBM) using the relative coordinate features from all the training videos, and then represent each video as the normalized parameters of a video-specific GMM adapted from the global GMM. In order to capture the spatio-temporal relationships at different levels, multiple GMMs are utilized to describe the context distributions of interest points over multi-scale local regions. To describe the appearance information of an action video, we also propose to use GMM to characterize the distribution of local appearance features from the cuboids centered around the interest points. Accordingly, an action video can be represented by two types of distribution features: 1) multiple GMM distributions of spatio-temporal context; 2) GMM distribution of local video appearance. To effectively fuse these two types of heterogeneous and complementary distribution features, we additionally propose a new learning algorithm, called Multiple Kernel Learning with Augmented Features (AFMKL), to learn an adapted classifier based on multiple kernels and the pre-learned classifiers of other action classes. Extensive experiments on KTH, multi-view IXMAS and complex UCF sports datasets demonstrate that our method generally achieves higher recognition accuracy than other state-of-the-art methods.