From image sequences towards conceptual descriptions
Image and Vision Computing
The computational perception of scene dynamics
Computer Vision and Image Understanding - Special issue on physics-based modeling and reasoning in computer vision
A Maximum-Likelihood Approach to Visual Event Classification
ECCV '96 Proceedings of the 4th European Conference on Computer Vision-Volume II - Volume II
Coupled hidden Markov models for complex action recognition
CVPR '97 Proceedings of the 1997 Conference on Computer Vision and Pattern Recognition (CVPR '97)
Understanding manipulation in video
FG '96 Proceedings of the 2nd International Conference on Automatic Face and Gesture Recognition (FG '96)
The computational perception of scene dynamics
The computational perception of scene dynamics
Learning temporal, relational, force-dynamic event definitions from video
Eighteenth national conference on Artificial intelligence
Reconstructing force-dynamic models from video sequences
Artificial Intelligence
Sequential inference with reliable observations: learning to construct force-dynamic models
Artificial Intelligence
Learning, detection and representation of multi-agent events in videos
Artificial Intelligence
Journal of Artificial Intelligence Research
Grounding the lexical semantics of verbs in visual perception using force dynamics and event logic
Journal of Artificial Intelligence Research
Sequential inference with reliable observations: Learning to construct force-dynamic models
Artificial Intelligence
Hi-index | 0.00 |
Understanding observations of interacting objects requires one to reason about qualitative scene dynamics. For example, on observing a hand lifting a can, we may infer that an 'active' hand is applying an upwards force (by grasping) to lift a 'passive' can. In previous work [6] we presented a system that infers qualitative scene dynamics from the instantaneous motion of objects. However, since that analysis only considered single frames in isolation, there were often multiple interpretations for each frame. In this work we show how the dynamic information inferred at each frame can be integrated over time to reduce ambiguity. Our approach to integrating information is to extend our representation to describe objects by a set of properties or capabilities that are assumed to persist over time. Given this extended representation we find interpretations that require the smallest set(s) of properties over the whole image sequence.