Discovering discriminative action parts from mid-level video representations

Authors:
Michalis Raptis
Affiliations:
UCLA
Venue:
CVPR '12 Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Year:
2012

Citing 0
Cited 6

Learning latent spatio-temporal compositional model for human action recognition

Proceedings of the 21st ACM international conference on Multimedia
Human action recognition with salient trajectories

Signal Processing
Matching mixtures of curves for human action recognition

Computer Vision and Image Understanding
Detecting People Looking at Each Other in Videos

International Journal of Computer Vision
Weighted feature trajectories and concatenated bag-of-features for action recognition

Neurocomputing
Activity representation with motion hierarchies

International Journal of Computer Vision

Quantified Score

Hi-index	0.00

Visualization

Abstract

We describe a mid-level approach for action recognition. From an input video, we extract salient spatio-temporal structures by forming clusters of trajectories that serve as candidates for the parts of an action. The assembly of these clusters into an action class is governed by a graphical model that incorporates appearance and motion constraints for the individual parts and pairwise constraints for the spatio-temporal dependencies among them. During training, we estimate the model parameters discriminatively. During classification, we efficiently match the model to a video using discrete optimization. We validate the model's classification ability in standard benchmark datasets and illustrate its potential to support a fine-grained analysis that not only gives a label to a video, but also identifies and localizes its constituent parts.