Pictorial Structures for Object Recognition
International Journal of Computer Vision
Histograms of Oriented Gradients for Human Detection
CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 1 - Volume 01
Observing Human-Object Interactions: Using Spatial and Functional Compatibility for Recognition
IEEE Transactions on Pattern Analysis and Machine Intelligence
Cutting-plane training of structural SVMs
Machine Learning
Object Detection with Discriminatively Trained Part-Based Models
IEEE Transactions on Pattern Analysis and Machine Intelligence
Detecting people using mutually consistent poselet activations
ECCV'10 Proceedings of the 11th European conference on Computer vision: Part VI
Weakly Supervised Learning of Interactions between Humans and Objects
IEEE Transactions on Pattern Analysis and Machine Intelligence
Recognition using visual phrases
CVPR '11 Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition
Action recognition from a distributed representation of pose and appearance
CVPR '11 Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition
Approximating discrete probability distributions with dependence trees
IEEE Transactions on Information Theory
Articulated part-based model for joint object detection and pose estimation
ICCV '11 Proceedings of the 2011 International Conference on Computer Vision
Exploring discriminative pose sub-patterns for effective action classification
Proceedings of the 21st ACM international conference on Multimedia
Coloring Action Recognition in Still Images
International Journal of Computer Vision
Hi-index | 0.00 |
We present a novel approach to modeling human pose, together with interacting objects, based on compositional models of local visual interactions and their relations. Skeleton models, while flexible enough to capture large articulations, fail to accurately model self-occlusions and interactions. Poselets and Visual Phrases address this limitation, but do so at the expense of requiring a large set of templates. We combine all three approaches with a compositional model that is flexible enough to model detailed articulations but still captures occlusions and object interactions. Unlike much previous work on action classification, we do not assume test images are labeled with a person, and instead present results for "action detection" in an unlabeled image. Notably, for each detection, our model reports back a detailed description including an action label, articulated human pose, object poses, and occlusion flags. We demonstrate that modeling occlusion is crucial for recognizing human-object interactions. We present results on the PASCAL Action Classification challenge that shows our unified model advances the state-of-the-art for detection, action classification, and articulated pose estimation.