Person Surveillance Using Visual and Infrared Imagery

  • Authors:
  • S. J. Krotosky;M. M. Trivedi

  • Affiliations:
  • Adv. Multimedia & Signal Process. Div., Sci. Applic. Int. Corp. (SAIC), San Diego, CA;-

  • Venue:
  • IEEE Transactions on Circuits and Systems for Video Technology
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents a methodology for analyzing multimodal and multiperspective systems for person surveillance. Using an experimental testbed consisting of two color and two infrared cameras, we can accurately register the color and infrared imagery for any general scene configuration, expanding the scope of multispectral analysis beyond the specialized long-range surveillance experiments of previous approaches to more general scene configurations common to unimodal approaches. We design an algorithmic framework for detecting people in a scene that can be generalized to include color, infrared, and/or disparity features. Using a combination of a histogram of oriented gradient (HOG) feature-based support vector machine and size/depth-based constraints, we create a probabilistic score for evaluating the presence of people. Using this framework, we train person detectors using color stereo and infrared stereo features as well as tetravision-based detectors that combine the detector outputs from separately trained color stereo and infrared stereo-based detectors. Additionally, we incorporate the trifocal tensor in order to combine the color and infrared features in a unified detection framework and use these trained detectors for an experimental evaluation of video sequences captured with our designed testbed. Our evaluation definitively demonstrates the performance gains achievable when using the trifocal framework to combine color and infrared features in a unified framework. Both of the trifocal setups outperform their unimodal equivalents, as well as the tetravision-based analysis. Our experiments also demonstrate how the trained detector generalizes well to different scenes and can provide robust input to an additional tracking framework.