Classification of multiscale spatiotemporal energy features for video segmentation and dynamic objects prioritisation

Authors:
Anna Belardinelli;Andrea Carbone;Werner X. Schneider
Affiliations:
Computer Science Department, University of Tübingen, Germany;ISIR - Institut des Systèmes Intelligents et de Robotique, UPMC, Paris, France;CITEC - Cognitive Interaction Technology Excellence Center, Bielefeld University, Germany and Neurocognitive Psychology, Department of Psychology, Bielefeld University, Germany
Venue:
Pattern Recognition Letters
Year:
2013

Citing 7
Cited 0

Filters, Random Fields and Maximum Entropy (FRAME): Towards a Unified Theory for Texture Modeling

International Journal of Computer Vision
Mean Shift: A Robust Approach Toward Feature Space Analysis

IEEE Transactions on Pattern Analysis and Machine Intelligence
Qualitative Spatiotemporal Analysis Using an Oriented Energy Representation

ECCV '00 Proceedings of the 6th European Conference on Computer Vision-Part II
Motion detection, noise reduction, texture suppression, and contour enhancement by spatiotemporal Gabor filters with surround inhibition

Biological Cybernetics
Modelling Spatio-Temporal Saliency to Predict Gaze Direction for Short Videos

International Journal of Computer Vision
Spatiotemporal Saliency in Dynamic Scenes

IEEE Transactions on Pattern Analysis and Machine Intelligence
LIBSVM: A library for support vector machines

ACM Transactions on Intelligent Systems and Technology (TIST)

Quantified Score

Hi-index	0.10

Visualization

Abstract

High level visual cognitive abilities such as scene understanding and behavioural analysis are modulated by attentive selective processes. These in turn rely on pre-attentive operations delivering perceptual organisation of the visual input and enabling the extraction of meaningful ''chunks'' of information. Specifically, the extraction and prioritisation of moving objects is a crucial step in the processing of dynamic scenes. Motion is of course a powerful cue for grouping regions and segregating objects but not all kinds of motion are equally meaningful and should be equally attended. On a coarse level, most interesting moving objects are associated with coherent motion, reflecting our sensitivity to biological motion. On the other hand, attention operates on a higher level, prioritising what moves differently with respect to both its surrounding and the global scene. In this paper, we propose how a qualitative segmentation of multiscale spatiotemporal energy features according to their frequency spectrum distribution can be used to pre-attentively extract regions of interest. We also show that discrimination boundaries between classes in the segmentation phase can be learned in an automatic and efficient way by a Support Vector Machine classifier in a multi-class implementation. Motion-related features are shown to best predict human fixations on an extensive dataset. The model generalises well to datasets other than that used for training, if scale is taken into account in the feature extraction step. Regions labelled as coherently moving are clustered in moving object files, described by the magnitude and phase of the pooled motion energy. The method succeeds in extracting meaningful moving objects from the background and identifying other less interesting motion patterns. A saliency function is finally computed on an object basis, instead of on a pixel basis, as in most current approaches. The same features are thus used for segmentation and selective attention and can be further used for recognition and scene interpretation.