Detecting actions, poses, and objects with relational phraselets

Authors:
Chaitanya Desai;Deva Ramanan
Affiliations:
University of California at Irvine, Irvine, CA;University of California at Irvine, Irvine, CA
Venue:
ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part IV
Year:
2012

Citing 11
Cited 2

Pictorial Structures for Object Recognition

International Journal of Computer Vision
Histograms of Oriented Gradients for Human Detection

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 1 - Volume 01
Observing Human-Object Interactions: Using Spatial and Functional Compatibility for Recognition

IEEE Transactions on Pattern Analysis and Machine Intelligence
Cutting-plane training of structural SVMs

Machine Learning
Object Detection with Discriminatively Trained Part-Based Models

IEEE Transactions on Pattern Analysis and Machine Intelligence
Detecting people using mutually consistent poselet activations

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part VI
Weakly Supervised Learning of Interactions between Humans and Objects

IEEE Transactions on Pattern Analysis and Machine Intelligence
Recognition using visual phrases

CVPR '11 Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition
Action recognition from a distributed representation of pose and appearance

CVPR '11 Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition
Approximating discrete probability distributions with dependence trees

IEEE Transactions on Information Theory
Articulated part-based model for joint object detection and pose estimation

ICCV '11 Proceedings of the 2011 International Conference on Computer Vision

Exploring discriminative pose sub-patterns for effective action classification

Proceedings of the 21st ACM international conference on Multimedia
Coloring Action Recognition in Still Images

International Journal of Computer Vision

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a novel approach to modeling human pose, together with interacting objects, based on compositional models of local visual interactions and their relations. Skeleton models, while flexible enough to capture large articulations, fail to accurately model self-occlusions and interactions. Poselets and Visual Phrases address this limitation, but do so at the expense of requiring a large set of templates. We combine all three approaches with a compositional model that is flexible enough to model detailed articulations but still captures occlusions and object interactions. Unlike much previous work on action classification, we do not assume test images are labeled with a person, and instead present results for "action detection" in an unlabeled image. Notably, for each detection, our model reports back a detailed description including an action label, articulated human pose, object poses, and occlusion flags. We demonstrate that modeling occlusion is crucial for recognizing human-object interactions. We present results on the PASCAL Action Classification challenge that shows our unified model advances the state-of-the-art for detection, action classification, and articulated pose estimation.