Why did the person cross the road (there)? scene understanding using probabilistic logic models and common sense reasoning

Authors:
Aniruddha Kembhavi;Tom Yeh;Larry S. Davis
Affiliations:
University of Maryland, College Park;University of Maryland, College Park;University of Maryland, College Park
Venue:
ECCV'10 Proceedings of the 11th European conference on Computer vision: Part II
Year:
2010

Citing 19
Cited 4

Learning Patterns of Activity Using Real-Time Tracking

IEEE Transactions on Pattern Analysis and Machine Intelligence
Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope

International Journal of Computer Vision
Histograms of Oriented Gradients for Human Detection

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 1 - Volume 01
Markov logic networks

Machine Learning
Putting Objects in Perspective

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
A System for Learning Statistical Motion Patterns

IEEE Transactions on Pattern Analysis and Machine Intelligence
Camera Calibration from Video of a Walking Human

IEEE Transactions on Pattern Analysis and Machine Intelligence
Recovering Surface Layout from an Image

International Journal of Computer Vision
Scene Classification Using a Hybrid Generative/Discriminative Approach

IEEE Transactions on Pattern Analysis and Machine Intelligence
Robust Object Detection with Interleaved Categorization and Segmentation

International Journal of Computer Vision
Event Modeling and Recognition Using Markov Logic Networks

ECCV '08 Proceedings of the 10th European Conference on Computer Vision: Part II
Robust Object Tracking by Hierarchical Association of Detection Responses

ECCV '08 Proceedings of the 10th European Conference on Computer Vision: Part II
Probabilistic Modeling of Scene Dynamics for Applications in Visual Surveillance

IEEE Transactions on Pattern Analysis and Machine Intelligence
Observing Human-Object Interactions: Using Spatial and Functional Compatibility for Recognition

IEEE Transactions on Pattern Analysis and Machine Intelligence
Efficient Subwindow Search: A Branch and Bound Framework for Object Localization

IEEE Transactions on Pattern Analysis and Machine Intelligence
Functional scene element recognition for video scene analysis

WMVC'09 Proceedings of the 2009 international conference on Motion and video computing
Scene classification via pLSA

ECCV'06 Proceedings of the 9th European conference on Computer Vision - Volume Part IV
Figure/Ground assignment in natural images

ECCV'06 Proceedings of the 9th European conference on Computer Vision - Volume Part II
Learning semantic scene models from observing activity in visual surveillance

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics

Can computers learn from humans to see better?: inferring scene semantics from viewers' eye movements

MM '11 Proceedings of the 19th ACM international conference on Multimedia
Probabilistic event calculus based on Markov logic networks

RuleML'11 Proceedings of the 5th international conference on Rule-based modeling and computing on the semantic web
Event processing under uncertainty

Proceedings of the 6th ACM International Conference on Distributed Event-Based Systems
A Markov logic framework for recognizing complex events from multimodal data

Proceedings of the 15th ACM on International conference on multimodal interaction

Quantified Score

Hi-index	0.00

Visualization

Abstract

We develop a video understanding system for scene elements, such as bus stops, crosswalks, and intersections, that are characterized more by qualitative activities and geometry than by intrinsic appearance. The domain models for scene elements are not learned from a corpus of video, but instead, naturally elicited by humans, and represented as probabilistic logic rules within a Markov Logic Network framework. Human elicited models, however, represent object interactions as they occur in the 3D world rather than describing their appearance projection in some specific 2D image plane. We bridge this gap by recovering qualitative scene geometry to analyze object interactions in the 3D world and then reasoning about scene geometry, occlusions and common sense domain knowledge using a set of meta-rules. The effectiveness of this approach is demonstrated on a set of videos of public spaces.