Learning latent spatio-temporal compositional model for human action recognition
Proceedings of the 21st ACM international conference on Multimedia
Human action recognition with salient trajectories
Signal Processing
Matching mixtures of curves for human action recognition
Computer Vision and Image Understanding
Detecting People Looking at Each Other in Videos
International Journal of Computer Vision
Activity representation with motion hierarchies
International Journal of Computer Vision
Hi-index | 0.00 |
We describe a mid-level approach for action recognition. From an input video, we extract salient spatio-temporal structures by forming clusters of trajectories that serve as candidates for the parts of an action. The assembly of these clusters into an action class is governed by a graphical model that incorporates appearance and motion constraints for the individual parts and pairwise constraints for the spatio-temporal dependencies among them. During training, we estimate the model parameters discriminatively. During classification, we efficiently match the model to a video using discrete optimization. We validate the model's classification ability in standard benchmark datasets and illustrate its potential to support a fine-grained analysis that not only gives a label to a video, but also identifies and localizes its constituent parts.