Selective visual attention enables learning and recognition of multiple objects in cluttered scenes

  • Authors:
  • Dirk Walther;Ueli Rutishauser;Christof Koch;Pietro Perona

  • Affiliations:
  • Comput. and Neural Syst. Prog., 139-74, California Institute of Technology, Pasadena, CA 91125, USA;Comput. and Neural Syst. Prog., 139-74, California Institute of Technology, Pasadena, CA 91125, USA;Comput. and Neural Syst. Prog., 139-74, California Institute of Technology, Pasadena, CA 91125, USA and Div. of Biology, California Institute of Technology, Pasadena, CA 91125, USA;Comput. and Neural Syst. Prog., 139-74, California Institute of Technology, Pasadena, CA 91125, USA and Dept. of Electr. Engin., 136-93, California Institute of Technology, Pasadena, CA 91125, USA

  • Venue:
  • Computer Vision and Image Understanding - Special issue: Attention and performance in computer vision
  • Year:
  • 2005

Quantified Score

Hi-index 0.01

Visualization

Abstract

A key problem in learning representations of multiple objects from unlabeled images is that it is a priori impossible to tell which part of the image corresponds to each individual object, and which part is irrelevant clutter. Distinguishing individual objects in a scene would allow unsupervised learning of multiple objects from unlabeled images. There is psychophysical and neurophysiological evidence that the brain employs visual attention to select relevant parts of the image and to serialize the perception of individual objects. We propose a method for the selection of salient regions likely to contain objects, based on bottom-up visual attention. By comparing the performance of David Lowe's recognition algorithm with and without attention, we demonstrate in our experiments that the proposed approach can enable one-shot learning of multiple objects from complex scenes, and that it can strongly improve learning and recognition performance in the presence of large amounts of clutter.