Discovering discriminative action parts from mid-level video representations

  • Authors:
  • Michalis Raptis

  • Affiliations:
  • UCLA

  • Venue:
  • CVPR '12 Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

We describe a mid-level approach for action recognition. From an input video, we extract salient spatio-temporal structures by forming clusters of trajectories that serve as candidates for the parts of an action. The assembly of these clusters into an action class is governed by a graphical model that incorporates appearance and motion constraints for the individual parts and pairwise constraints for the spatio-temporal dependencies among them. During training, we estimate the model parameters discriminatively. During classification, we efficiently match the model to a video using discrete optimization. We validate the model's classification ability in standard benchmark datasets and illustrate its potential to support a fine-grained analysis that not only gives a label to a video, but also identifies and localizes its constituent parts.