Adaptive scene dependent filters for segmentation and online learning of visual objects

  • Authors:
  • J. J. Steil;M. Götting;H. Wersing;E. Körner;H. Ritter

  • Affiliations:
  • Neuroinformatics Group, Faculty of Technology, Bielefeld University, P.O. Box 100131, D-33501 Bielefeld, Germany and Honda Research Institute GmbH, Carl-Legien-Str. 30, 63073 Offenbach, Germany;Neuroinformatics Group, Faculty of Technology, Bielefeld University, P.O. Box 100131, D-33501 Bielefeld, Germany;Honda Research Institute GmbH, Carl-Legien-Str. 30, 63073 Offenbach, Germany;Honda Research Institute GmbH, Carl-Legien-Str. 30, 63073 Offenbach, Germany;Neuroinformatics Group, Faculty of Technology, Bielefeld University, P.O. Box 100131, D-33501 Bielefeld, Germany

  • Venue:
  • Neurocomputing
  • Year:
  • 2007

Quantified Score

Hi-index 0.01

Visualization

Abstract

We propose the adaptive scene dependent filter (ASDF) hierarchy for unsupervised learning of image segmentation, which integrates several processing pathways into a flexible, highly dynamic, and real-time capable vision architecture. It is based on forming a combined feature space from basic feature maps like, color, disparity, and pixel position. To guarantee real-time performance, we apply an enhanced vector quantization method to partition this feature space. The learned codebook defines corresponding best-match segments for each prototype and yields an over-segmentation of the object and the surround. The segments are recombined into a final object segmentation mask based on a relevance map, which encodes a coarse bottom-up hypothesis where the object is located in the image. We apply the ASDF hierarchy for preprocessing input images in a feature-based biologically motivated object recognition learning architecture and show experiments with this real-time vision system running at 6Hz including the online learning of the segmentation. Because interaction with user is not perfect, the real-world system acquires useful views effectively only at about 1.5Hz, but we show that for training a new object one hundred views taking only one minute of interaction time is sufficient.