Unsupervised Activity Perception in Crowded and Complicated Scenes Using Hierarchical Bayesian Models

Authors:
Xiaogang Wang;Xiaoxu Ma;W. E. L. Grimson
Affiliations:
MIT, Cambridge;MIT, Cambridge;MIT, Cambridge
Venue:
IEEE Transactions on Pattern Analysis and Machine Intelligence
Year:
2009

Citing 0
Cited 45

Scene modelling and classification using learned spatial relations

COSIT'09 Proceedings of the 9th international conference on Spatial information theory
Detecting contextual anomalies of crowd motion in surveillance video

ICIP'09 Proceedings of the 16th IEEE international conference on Image processing
Time-Delayed Correlation Analysis for Multi-Camera Activity Understanding

International Journal of Computer Vision
A proposal for local and global human activities identification

AMDO'10 Proceedings of the 6th international conference on Articulated motion and deformable objects
Video topic modelling with behavioural segmentation

Proceedings of the 1st ACM international workshop on Multimodal pervasive video analysis
Abnormality detection using low-level co-occurring events

Pattern Recognition Letters
Unsupervised discovery of activity correlations using latent topic models

Proceedings of the Seventh Indian Conference on Computer Vision, Graphics and Image Processing
Exploiting multiple cameras for environmental pathlets

ISVC'10 Proceedings of the 6th international conference on Advances in visual computing - Volume Part III
Relational Graph Mining for Learning Events from Video

Proceedings of the 2010 conference on STAIRS 2010: Proceedings of the Fifth Starting AI Researchers' Symposium
Motion-based unusual event detection in human crowds

Journal of Visual Communication and Image Representation
Anomalous video event detection using spatiotemporal context

Computer Vision and Image Understanding
Stream-based active unusual event detection

ACCV'10 Proceedings of the 10th Asian conference on Computer vision - Volume Part I
Learning rare behaviours

ACCV'10 Proceedings of the 10th Asian conference on Computer vision - Volume Part II
Motion pattern extraction and event detection for automatic visual surveillance

Journal on Image and Video Processing - Special issue on advanced video-based surveillance
Trajectory Analysis and Semantic Region Modeling Using Nonparametric Hierarchical Bayesian Models

International Journal of Computer Vision
Human action recognition by extracting features from negative space

ICIAP'11 Proceedings of the 16th international conference on Image analysis and processing - Volume Part II
Exploiting petri-net structure for activity classification and user instruction within an industrial setting

ICMI '11 Proceedings of the 13th international conference on multimodal interfaces
Dynamic texture reconstruction from sparse codes for unusual event detection in crowded scenes

J-MRE '11 Proceedings of the 2011 joint ACM workshop on Modeling and representing events
Building semantic scene models from unconstrained video

Computer Vision and Image Understanding
Abnormal crowd behavior detection by social force optimization

HBU'11 Proceedings of the Second international conference on Human Behavior Unterstanding
Workflow activity monitoring using dynamics of pair-wise qualitative spatial relations

MMM'12 Proceedings of the 18th international conference on Advances in Multimedia Modeling
Learning Behavioural Context

International Journal of Computer Vision
Discovering activity interactions in a single pass over a video stream

Proceedings of the 27th Annual ACM Symposium on Applied Computing
Video Behaviour Mining Using a Dynamic Topic Model

International Journal of Computer Vision
One-scan rule extraction to explain significant vehicle interactions with guaranteed error value

ACM SIGAPP Applied Computing Review
Intrinsic Bayesian model for high-dimensional unsupervised reduction

Neurocomputing
Is that scene dangerous?: transferring knowledge over a video stream

Proceedings of the 5th Ph.D. workshop on Information and knowledge
Exploratory search of long surveillance videos

Proceedings of the 20th ACM international conference on Multimedia
Intelligent multi-camera video surveillance: A review

Pattern Recognition Letters
Human behavior analysis in video surveillance: A Social Signal Processing perspective

Neurocomputing
Learning common behaviors from large sets of unlabeled temporal series

Image and Vision Computing
Coherent filtering: detecting coherent motions from crowd clutters

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part II
Spatio-Temporal phrases for activity recognition

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part III
Exploiting sparse representations for robust analysis of noisy complex video scenes

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part VI
Visual code-sentences: a new video representation based on image descriptor sequences

ECCV'12 Proceedings of the 12th international conference on Computer Vision - Volume Part I
Unsupervised mining of long time series based on latent topic model

Neurocomputing
Large-scale statistical modeling of motion patterns: a Bayesian nonparametric approach

Proceedings of the Eighth Indian Conference on Computer Vision, Graphics and Image Processing
Retrieving actions in group contexts

ECCV'10 Proceedings of the 11th European conference on Trends and Topics in Computer Vision - Volume Part I
Auto learning temporal atomic actions for activity classification

Pattern Recognition
Abnormal event detection in crowded scenes using sparse representation

Pattern Recognition
Human action recognition employing negative space features

Journal of Visual Communication and Image Representation
Robust abandoned object detection integrating wide area visual surveillance and social context

Pattern Recognition Letters
Activity clustering for anomaly detection

International Journal of Intelligent Information and Database Systems
Self-help: Seeking out perplexing images for ever improving topological mapping

International Journal of Robotics Research
Summarizing high-level scene behavior

Machine Vision and Applications

Quantified Score

Hi-index	0.15

Visualization

Abstract

We propose a novel unsupervised learning framework to model activities and interactions in crowded and complicated scenes. Hierarchical Bayesian models are used to connect three elements in visual surveillance: low-level visual features, simple "atomic" activities, and interactions. Atomic activities are modeled as distributions over low-level visual features, and multi-agent interactions are modeled as distributions over atomic activities. These models are learnt in an unsupervised way. Given a long video sequence, moving pixels are clustered into different atomic activities and short video clips are clustered into different interactions. In this paper, we propose three hierarchical Bayesian models, Latent Dirichlet Allocation (LDA) mixture model, Hierarchical Dirichlet Process (HDP) mixture model, and Dual Hierarchical Dirichlet Processes (Dual-HDP) model. They advance existing language models, such as LDA [1] and HDP [2]. Our data sets are challenging video sequences from crowded traffic scenes and train station scenes with many kinds of activities co-occurring. Without tracking and human labeling effort, our framework completes many challenging visual surveillance tasks of board interest such as: (1) discovering typical atomic activities and interactions; (2) segmenting long video sequences into different interactions; (3) segmenting motions into different activities; (4) detecting abnormality; and (5) supporting high-level queries on activities and interactions.