Efficient extraction and representation of spatial information from video data

Authors:
Hajar Sadeghi Sokeh;Stephen Gould;Jochen Renz
Affiliations:
Research School of Computer Science, The Australian National University, Canberra, ACT;Research School of Computer Science, The Australian National University, Canberra, ACT;Research School of Computer Science, The Australian National University, Canberra, ACT
Venue:
IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Year:
2013

Citing 9
Cited 0

Support-Vector Networks

Machine Learning
Latent dirichlet allocation

The Journal of Machine Learning Research
Robust Real-Time Face Detection

International Journal of Computer Vision
A Bayesian Hierarchical Model for Learning Natural Scene Categories

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 2 - Volume 02
Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words

International Journal of Computer Vision
From video to RCC8: exploiting a distance based semantics to stabilise the interpretation of mereotopological relations

COSIT'11 Proceedings of the 10th international conference on Spatial information theory
Multi-agent event recognition in structured scenarios

CVPR '11 Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition
Branch-and-price global optimization for multi-view multi-target tracking

CVPR '12 Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Towards unsupervised semantic segmentation of street scenes from motion cues

Proceedings of the 27th Conference on Image and Vision Computing New Zealand

Quantified Score

Hi-index	0.00

Visualization

Abstract

Vast amounts of video data are available on the web and are being generated daily using surveillance cameras or other sources. Being able to efficiently analyse and process this data is essential for a number of different applications. We want to be able to efficiently detect activities in these videos or be able to extract and store essential information contained in these videos for future use and easy search and access. Cohn et al. (2012) proposed a comprehensive representation of spatial features that can be efficiently extracted from video and used for these purposes. In this paper, we present a modified version of this approach that is equally efficient and allows us to extract spatial information with much higher accuracy than previously possible. We present efficient algorithms both for extracting and storing spatial information from video, as well as for processing this information in order to obtain useful spatial features. We evaluate our approach and demonstrate that the extracted spatial information is considerably more accurate than that obtained from existing approaches.