Building semantic scene models from unconstrained video

Authors:
Hannah M. Dee;Anthony G. Cohn;David C. Hogg
Affiliations:
Department of Computer Science, Aberystwyth University, Penglais, Aberystwyth SY23 3DB, United Kingdom;School of Computing, University of Leeds, Leeds LS2 9JT, United Kingdom;School of Computing, University of Leeds, Leeds LS2 9JT, United Kingdom
Venue:
Computer Vision and Image Understanding
Year:
2012

Citing 17
Cited 0

Learning Patterns of Activity Using Real-Time Tracking

IEEE Transactions on Pattern Analysis and Machine Intelligence
Fast Approximate Energy Minimization via Graph Cuts

IEEE Transactions on Pattern Analysis and Machine Intelligence
Recognizing Action at a Distance

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
What Energy Functions Can Be Minimizedvia Graph Cuts?

IEEE Transactions on Pattern Analysis and Machine Intelligence
An Experimental Comparison of Min-Cut/Max-Flow Algorithms for Energy Minimization in Vision

IEEE Transactions on Pattern Analysis and Machine Intelligence
Summarising contextual activity and detecting unusual inactivity in a supportive home environment

Pattern Analysis & Applications
On Space-Time Interest Points

International Journal of Computer Vision
Putting Objects in Perspective

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Recognition and Segmentation of Scene Content using Region-Based Classification

ICPR '06 Proceedings of the 18th International Conference on Pattern Recognition - Volume 01
Incremental, scalable tracking of objects inter camera

Computer Vision and Image Understanding
Modelling Scenes Using the Activity within Them

Proceedings of the international conference on Spatial Cognition VI: Learning, Reasoning, and Talking about Space
Unsupervised Activity Perception in Crowded and Complicated Scenes Using Hierarchical Bayesian Models

IEEE Transactions on Pattern Analysis and Machine Intelligence
Probabilistic Modeling of Scene Dynamics for Applications in Visual Surveillance

IEEE Transactions on Pattern Analysis and Machine Intelligence
An iterative image registration technique with an application to stereo vision

IJCAI'81 Proceedings of the 7th international joint conference on Artificial intelligence - Volume 2
Scene modelling and classification using learned spatial relations

COSIT'09 Proceedings of the 9th international conference on Spatial information theory
Human detection using oriented histograms of flow and appearance

ECCV'06 Proceedings of the 9th European conference on Computer Vision - Volume Part II
Learning semantic scene models from observing activity in visual surveillance

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes a method for building semantic scene models from video data using observed motion. We do this through unsupervised clustering of simple yet novel motion descriptors, which provide a quantized representation of gross motion within scene regions. Using these we can characterise the dominant patterns of motion, and then group spatial regions based upon both proximity and local motion similarity to define areas or regions with particular motion characteristics. We are able to process scenes in which objects are difficult to detect and track due to variable frame-rate, video quality or occlusion, and we are able to identify regions which differ by usage but which do not differ by appearance (such as frequently used paths across open space). We demonstrate our method on 50 videos from very different scene types: indoor scenarios with unpredictable unconstrained motion, junction scenes, road and path scenes, and open squares or plazas. We show that these scenes can be clustered using our representation, and that the incorporation of learned spatial relations into the representation enables us to cluster more effectively. This method enables us to make meaningful statements about video scenes as a whole (such as ''this video is like that video'') and about regions within these scenes (such as ''this part of this scene is similar to that part of that scene'').