Pre-Attentive and Attentive Detection of Humans in Wide-Field Scenes

  • Authors:
  • J. H. Elder;S. J. Prince;Y. Hou;M. Sizintsev;E. Olevskiy

  • Affiliations:
  • Centre for Vision Research, York University, Toronto M3J 1P3;Centre for Vision Research, York University, Toronto M3J 1P3;Centre for Vision Research, York University, Toronto M3J 1P3;Centre for Vision Research, York University, Toronto M3J 1P3;Centre for Vision Research, York University, Toronto M3J 1P3

  • Venue:
  • International Journal of Computer Vision
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

We address the problem of localizing and obtaining high-resolution footage of the people present in a scene. We propose a biologically-inspired solution combining pre-attentive, low-resolution sensing for detection with shiftable, high-resolution, attentive sensing for confirmation and further analysis.The detection problem is made difficult by the unconstrained nature of realistic environments and human behaviour, and the low resolution of pre-attentive sensing. Analysis of human peripheral vision suggests a solution based on integration of relatively simple but complementary cues. We develop a Bayesian approach involving layered probabilistic modeling and spatial integration using a flexible norm that maximizes the statistical power of both dense and sparse cues. We compare the statistical power of several cues and demonstrate the advantage of cue integration. We evaluate the Bayesian cue integration method for human detection on a labelled surveillance database and find that it outperforms several competing methods based on conjunctive combinations of classifiers (e.g., Adaboost). We have developed a real-time version of our pre-attentive human activity sensor that generates saccadic targets for an attentive foveated vision system. Output from high-resolution attentive detection algorithms and gaze state parameters are fed back as statistical priors and combined with pre-attentive cues to determine saccadic behaviour. The result is a closed-loop system that fixates faces over a 130 deg field of view, allowing high-resolution capture of facial video over a large dynamic scene.