Classification of multiscale spatiotemporal energy features for video segmentation and dynamic objects prioritisation

  • Authors:
  • Anna Belardinelli;Andrea Carbone;Werner X. Schneider

  • Affiliations:
  • Computer Science Department, University of Tübingen, Germany;ISIR - Institut des Systèmes Intelligents et de Robotique, UPMC, Paris, France;CITEC - Cognitive Interaction Technology Excellence Center, Bielefeld University, Germany and Neurocognitive Psychology, Department of Psychology, Bielefeld University, Germany

  • Venue:
  • Pattern Recognition Letters
  • Year:
  • 2013

Quantified Score

Hi-index 0.10

Visualization

Abstract

High level visual cognitive abilities such as scene understanding and behavioural analysis are modulated by attentive selective processes. These in turn rely on pre-attentive operations delivering perceptual organisation of the visual input and enabling the extraction of meaningful ''chunks'' of information. Specifically, the extraction and prioritisation of moving objects is a crucial step in the processing of dynamic scenes. Motion is of course a powerful cue for grouping regions and segregating objects but not all kinds of motion are equally meaningful and should be equally attended. On a coarse level, most interesting moving objects are associated with coherent motion, reflecting our sensitivity to biological motion. On the other hand, attention operates on a higher level, prioritising what moves differently with respect to both its surrounding and the global scene. In this paper, we propose how a qualitative segmentation of multiscale spatiotemporal energy features according to their frequency spectrum distribution can be used to pre-attentively extract regions of interest. We also show that discrimination boundaries between classes in the segmentation phase can be learned in an automatic and efficient way by a Support Vector Machine classifier in a multi-class implementation. Motion-related features are shown to best predict human fixations on an extensive dataset. The model generalises well to datasets other than that used for training, if scale is taken into account in the feature extraction step. Regions labelled as coherently moving are clustered in moving object files, described by the magnitude and phase of the pooled motion energy. The method succeeds in extracting meaningful moving objects from the background and identifying other less interesting motion patterns. A saliency function is finally computed on an object basis, instead of on a pixel basis, as in most current approaches. The same features are thus used for segmentation and selective attention and can be further used for recognition and scene interpretation.