The "Inverse hollywood problem": from video to scripts and storyboards via causal analysis

Authors:
Matthew Brand
Affiliations:
The Media Lab, MIT, Cambridge, MA
Venue:
AAAI'97/IAAI'97 Proceedings of the fourteenth national conference on artificial intelligence and ninth conference on Innovative applications of artificial intelligence
Year:
1997

Citing 4
Cited 9

Computational Perception of Scene Dynamics

ECCV '96 Proceedings of the 4th European Conference on Computer Vision-Volume II - Volume II
A Maximum-Likelihood Approach to Visual Event Classification

ECCV '96 Proceedings of the 4th European Conference on Computer Vision-Volume II - Volume II
Coupled hidden Markov models for complex action recognition

CVPR '97 Proceedings of the 1997 Conference on Computer Vision and Pattern Recognition (CVPR '97)
Real-Time Self-Calibrating Stereo Person Tracking Using 3-D Shape Estimation from Blob Features

ICPR '96 Proceedings of the International Conference on Pattern Recognition (ICPR '96) Volume III-Volume 7276 - Volume 7276

Understanding Human Behaviors Based on Eye-Head-Hand Coordination

BMCV '02 Proceedings of the Second International Workshop on Biologically Motivated Computer Vision
Learning temporal, relational, force-dynamic event definitions from video

Eighteenth national conference on Artificial intelligence
Reconstructing force-dynamic models from video sequences

Artificial Intelligence
Visualizing Competitive Behaviors in Multi-User Virtual Environments

VIS '04 Proceedings of the conference on Visualization '04
Generating Comics from 3D Interactive Computer Graphics

IEEE Computer Graphics and Applications
Distributed Activity Recognition with Fuzzy-Enabled Wireless Sensor Networks

DCOSS '08 Proceedings of the 4th IEEE international conference on Distributed Computing in Sensor Systems
Specific-to-general learning for temporal events with application to learning event definitions from video

Journal of Artificial Intelligence Research
Grounding the lexical semantics of verbs in visual perception using force dynamics and event logic

Journal of Artificial Intelligence Research
Understanding video events: a survey of methods for automatic interpretation of semantic occurrences in video

IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews

Quantified Score

Hi-index	0.00

Visualization

Abstract

We address the problem of visually detecting causal events and fitting them together into a coherent story of the action witnessed by the camera. We show that this can be done by reasoning about the motions and collisions of surfaces, using high-level causal constraints derived from psychological studies of infant visual behavior. These constraints are naive forms of basic physical laws governing substantiality, contiguity, momentum, and acceleration. We describe two implementations. One system parses instructional videos, extracting plans of action and key frames suitable for storyboarding. Since learning will play a role in making such systems robust, we introduce a new framework for higher-order hidden Markov models and demonstrate its use in a second system that segments stereo video into actions in near real-time. Rather than attempt accurate low-level vision, both systems use high-level causal analysis to integrate fast but sloppy pixel-based representations over time. The output is suitable for summary, indexing, and automated editing.