Inducing Features of Random Fields
IEEE Transactions on Pattern Analysis and Machine Intelligence
Recognition of Visual Activities and Interactions by Stochastic Parsing
IEEE Transactions on Pattern Analysis and Machine Intelligence
Recognizing multitasked activities from video using stochastic context-free grammar
Eighteenth national conference on Artificial intelligence
Coupled hidden Markov models for complex action recognition
CVPR '97 Proceedings of the 1997 Conference on Computer Vision and Pattern Recognition (CVPR '97)
An efficient context-free parsing algorithm
An efficient context-free parsing algorithm
Composite Templates for Cloth Modeling and Sketching
CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 1
Recognition of Composite Human Activities through Context-Free Grammar Based Representation
CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Minimax Entropy Principle and Its Application to Texture Modeling
Neural Computation
Sharing Visual Features for Multiclass and Multiview Object Detection
IEEE Transactions on Pattern Analysis and Machine Intelligence
Coupled Hidden Semi Markov Models for Activity Recognition
WMVC '07 Proceedings of the IEEE Workshop on Motion and Video Computing
From frequent itemsets to semantically meaningful visual patterns
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
A stochastic grammar of images
Foundations and Trends® in Computer Graphics and Vision
Unsupervised Learning of Probabilistic Context-Free Grammar using Iterative Biclustering
ICGI '08 Proceedings of the 9th international colloquium on Grammatical Inference: Algorithms and Applications
Bottom-Up/Top-Down Image Parsing with Attribute Grammar
IEEE Transactions on Pattern Analysis and Machine Intelligence
CASEE: a hierarchical event representation for the analysis of videos
AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
Learning Active Basis Model for Object Detection and Recognition
International Journal of Computer Vision
CO3 for ultra-fast and accurate interactive segmentation
Proceedings of the international conference on Multimedia
PADS: A Probabilistic Activity Detection Framework for Video Data
IEEE Transactions on Pattern Analysis and Machine Intelligence
An Extended Grammar System for Learning and Recognizing Complex Visual Events
IEEE Transactions on Pattern Analysis and Machine Intelligence
Probabilistic event logic for interval-based event recognition
CVPR '11 Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition
Unsupervised learning of event AND-OR grammar and semantics from video
ICCV '11 Proceedings of the 2011 International Conference on Computer Vision
Parsing video events with goal inference and intent prediction
ICCV '11 Proceedings of the 2011 International Conference on Computer Vision
Learning AND-OR Templates for Object Recognition and Detection
IEEE Transactions on Pattern Analysis and Machine Intelligence
Hi-index | 0.00 |
In this paper, we present a framework for parsing video events with stochastic Temporal And-Or Graph (T-AOG) and unsupervised learning of the T-AOG from video. This T-AOG represents a stochastic event grammar. The alphabet of the T-AOG consists of a set of grounded spatial relations including the poses of agents and their interactions with objects in the scene. The terminal nodes of the T-AOG are atomic actions which are specified by a number of grounded relations over image frames. An And-node represents a sequence of actions. An Or-node represents a number of alternative ways of such concatenations. The And-Or nodes in the T-AOG can generate a set of valid temporal configurations of atomic actions, which can be equivalently represented as the language of a stochastic context-free grammar (SCFG). For each And-node we model the temporal relations of its children nodes to distinguish events with similar structures but different temporal patterns and interpolate missing portions of events. This makes the T-AOG grammar context-sensitive. We propose an unsupervised learning algorithm to learn the atomic actions, the temporal relations and the And-Or nodes under the information projection principle in a coherent probabilistic framework. We also propose an event parsing algorithm based on the T-AOG which can understand events, infer the goal of agents, and predict their plausible intended actions. In comparison with existing methods, our paper makes the following contributions. (i) We represent events by a T-AOG with hierarchical compositions of events and the temporal relations between the sub-events. (ii) We learn the grammar, including atomic actions and temporal relations, automatically from the video data without manual supervision. (iii) Our algorithm infers the goal of agents and predicts their intents by a top-down process, handles events insertion and multi-agent events, keeps all possible interpretations of the video to preserve the ambiguities, and achieves the globally optimal parsing solution in a Bayesian framework. (iv) The algorithm uses event context to improve the detection of atomic actions, segment and recognize objects in the scene. Extensive experiments, including indoor and out door scenes, single and multiple agents events, are conducted to validate the effectiveness of the proposed approach.