Combining per-frame and per-track cues for multi-person action recognition

  • Authors:
  • Sameh Khamis;Vlad I. Morariu;Larry S. Davis

  • Affiliations:
  • University of Maryland, College Park;University of Maryland, College Park;University of Maryland, College Park

  • Venue:
  • ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part I
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

We propose a model to combine per-frame and per-track cues for action recognition. With multiple targets in a scene, our model simultaneously captures the natural harmony of an individual's action in a scene and the flow of actions of an individual in a video sequence, inferring valid tracks in the process. Our motivation is based on the unlikely discordance of an action in a structured scene, both at the track level and the frame level (e.g., a person dancing in a crowd of joggers). While we can utilize sampling approaches for inference in our model, we instead devise a global inference algorithm by decomposing the problem and solving the subproblems exactly and efficiently, recovering a globally optimal joint solution in several cases. Finally, we improve on the state-of-the-art action recognition results for two publicly available datasets.