Semantic event representation and recognition using syntactic attribute graph grammar

Authors:
Liang Lin;Haifeng Gong;Li Li;Liang Wang
Affiliations:
School of Information Science and Technology, Beijing Institute of Technology, Beijing 100081, China and Lotus Hill Research Institute for Computer Vision and Information Science, Ezhou 436000, Ch ...;Lotus Hill Research Institute for Computer Vision and Information Science, Ezhou 436000, China;Ocean University of China, Qingdao 266071, China;Institute of Computing Technology of CAS, Beijing 100190, China
Venue:
Pattern Recognition Letters
Year:
2009

Citing 13
Cited 6

Visual surveillance in a dynamic and uncertain world

Artificial Intelligence - Special volume on computer vision
A State-Based Approach to the Representation and Recognition of Gesture

IEEE Transactions on Pattern Analysis and Machine Intelligence
Adaptive Probabilistic Networks with Hidden Variables

Machine Learning - Special issue on learning with probabilistic representations
Parametric Hidden Markov Models for Gesture Recognition

IEEE Transactions on Pattern Analysis and Machine Intelligence
W4: Real-Time Surveillance of People and Their Activities

IEEE Transactions on Pattern Analysis and Machine Intelligence
A Bayesian Computer Vision System for Modeling Human Interactions

IEEE Transactions on Pattern Analysis and Machine Intelligence
Recognition of Visual Activities and Interactions by Stochastic Parsing

IEEE Transactions on Pattern Analysis and Machine Intelligence
Event Detection and Analysis from Video Streams

IEEE Transactions on Pattern Analysis and Machine Intelligence
Bottom-up/Top-Down Image Parsing by Attribute Graph Grammar

ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision - Volume 2
Composite Templates for Cloth Modeling and Sketching

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 1
A stochastic grammar of images

Foundations and Trends® in Computer Graphics and Vision
Introduction to a large-scale general purpose ground truth database: methodology, annotation tool and benchmarks

EMMCVPR'07 Proceedings of the 6th international conference on Energy minimization methods in computer vision and pattern recognition
Dynamic feature cascade for multiple object tracking with trackability analysis

EMMCVPR'07 Proceedings of the 6th international conference on Energy minimization methods in computer vision and pattern recognition

Character-Net: Character Network Analysis from Video

WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Approximate reasoning and finite state machines to the detection of actions in video sequences

International Journal of Approximate Reasoning
Mining Layered Grammar Rules for Action Recognition

International Journal of Computer Vision
Explaining Activities as Consistent Groups of Events

International Journal of Computer Vision
Social network analysis in a movie using character-net

Multimedia Tools and Applications
Learning latent spatio-temporal compositional model for human action recognition

Proceedings of the 21st ACM international conference on Multimedia

Quantified Score

Hi-index	0.10

Visualization

Abstract

The representation and recognition of complex semantic events (e.g. illegal parking, stealing objects) is a challenging task for high-level understanding of video sequence. To solve this problem, an attribute graph grammar for events modeling is studied in this paper. This grammar models the variability of semantic events by a set of meaningful ''event components'' with the spatio-temporal constraints. The event components are defined manually according to their semantic meaning, and further decomposed into atomic event primitives. These event primitives are learned on a object-trajectory table that describes mobile object attributes (location, velocity, and visibility) in a video sequence. A dictionary of temporal and spatial relations are defined to constrain the event primitives. With this representation, one observed event can be parsed into an ''event parse graph'', and all possible variability of one event can be modeled into an ''event And-Or graph'', in a syntactic way. The probability model of an ''event And-Or graph'' can be learned on a set of annotated event instances, and given a learned event And-Or graph, a Gibbs sampling scheme is utilized for inference on a testing video. In the experiments, we test events recognition performance of the proposed on both real indoor and outdoor videos and show quantitative recognition rate on the public LHI dataset.