Observing Human-Object Interactions: Using Spatial and Functional Compatibility for Recognition

Authors:
Abhinav Gupta;Aniruddha Kembhavi;Larry S. Davis
Affiliations:
University of Maryland, College Park;University of Maryland, College Park;University of Maryland, College Park
Venue:
IEEE Transactions on Pattern Analysis and Machine Intelligence
Year:
2009

Citing 0
Cited 25

A survey on vision-based human action recognition

Image and Vision Computing
Visual object-action recognition: Inferring object affordances from human demonstration

Computer Vision and Image Understanding
Why did the person cross the road (there)? scene understanding using probabilistic logic models and common sense reasoning

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part II
Every picture tells a story: generating sentences from images

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part IV
Robust sequence alignment for actor-object interaction recognition: Discovering actor-object states

Computer Vision and Image Understanding
Abstraction and generalization of 3D structure for recognition in large intra-class variation

ACCV'10 Proceedings of the 10th Asian conference on Computer vision - Volume Part III
On importance of interactions and context in human action recognition: Nataliya,,Shapovalova

IbPRIA'11 Proceedings of the 5th Iberian conference on Pattern recognition and image analysis
Actions in stillweb images: visualization, detection and retrieval

WAIM'11 Proceedings of the 12th international conference on Web-age information management
Learning from mistakes: object movement classification by the boosted features

ACCV'10 Proceedings of the 2010 international conference on Computer vision - Volume Part I
FollowMe: enhancing mobile applications with open infrastructure sensing

Proceedings of the 12th Workshop on Mobile Computing Systems and Applications
Synergistic methods for using language in robotics

Proceedings of the Workshop on Performance Metrics for Intelligent Systems
Modeling complex temporal composition of actionlets for activity prediction

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part I
Learning human interaction by interactive phrases

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part I
Learning to recognize daily actions using gaze

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part I
Scene semantics from long-term observation of people

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part VI
Detecting actions, poses, and objects with relational phraselets

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part IV
Action recognition with exemplar based 2.5d graph matching

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part IV
Collective activity localization with contextual spatial pyramid

ECCV'12 Proceedings of the 12th international conference on Computer Vision - Volume Part III
On recognizing actions in still images via multiple features

ECCV'12 Proceedings of the 12th international conference on Computer Vision - Volume Part III
Exploiting language models to recognize unseen actions

Proceedings of the 3rd ACM conference on International conference on multimedia retrieval
Non-parametric hand pose estimation with object context

Image and Vision Computing
Large-scale web video shot ranking based on visual features and tag co-occurrence

Proceedings of the 21st ACM international conference on Multimedia
Discriminative hierarchical part-based models for human parsing and action recognition

The Journal of Machine Learning Research
Multi levels semantic architecture for multimodal interaction

Applied Intelligence
A non-command interface for automatic document provision during meetings

Proceedings of the companion publication of the 19th international conference on Intelligent User Interfaces

Quantified Score

Hi-index	0.14

Visualization

Abstract

Interpretation of images and videos containing humans interacting with different objects is a daunting task. It involves understanding scene/event, analyzing human movements, recognizing manipulable objects, and observing the effect of the human movement on those objects. While each of these perceptual tasks can be conducted independently, recognition rate improves when interactions between them are considered. Motivated by psychological studies of human perception, we present a Bayesian approach which integrates various perceptual tasks involved in understanding human-object interactions. Previous approaches to object and action recognition rely on static shape/appearance feature matching and motion analysis, respectively. Our approach goes beyond these traditional approaches and applies spatial and functional constraints on each of the perceptual elements for coherent semantic interpretation. Such constraints allow us to recognize objects and actions when the appearances are not discriminative enough. We also demonstrate the use of such constraints in recognition of actions from static images without using any motion information.