On the Optimality of Spatial Attention for Object Detection

Authors:
Jonathan Harel;Christof Koch
Affiliations:
California Institute of Technology, Pasadena, 91125;California Institute of Technology, Pasadena, 91125
Venue:
Attention in Cognitive Systems
Year:
2009

Citing 8
Cited 1

Modeling visual attention via selective tuning

Artificial Intelligence - Special volume on computer vision
A computational model for visual selection

Neural Computation
Where to look next in 3D object search

ISCV '95 Proceedings of the International Symposium on Computer Vision
Distinctive Image Features from Scale-Invariant Keypoints

International Journal of Computer Vision
Combining attention and recognition for rapid scene analysis

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Workshops - Volume 03
Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
2006 Special Issue: Modeling attention to salient proto-objects

Neural Networks
Is bottom-up attention useful for object recognition?

CVPR'04 Proceedings of the 2004 IEEE computer society conference on Computer vision and pattern recognition

Saliency from hierarchical adaptation through decorrelation and variance normalization

Image and Vision Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Studies on visual attention traditionally focus on its physiological and psychophysical nature [16,18,19], or its algorithmic applications [1,9,21]. We here develop a simple, formal mathematical model of the advantage of spatial attention for object detection, in which spatial attention is defined as processing a subset of the visual input, and detection is an abstraction with certain failure characteristics. We demonstrate that it is suboptimal to process the entire visual input given prior information about target locations, which in practice is almost always available in a video setting due to tracking, motion, or saliency. This argues for an attentional strategy independent of computational savings: no matter how much computational power is available, it is in principle better to dedicate it preferentially to selected portions of the scene. This suggests, anecdotally, a form of environmental pressure for the evolution of foveated photoreceptor densities in the retina. It also offers a general justification for the use of spatial attention in machine vision.