Spatiotemporal fusion framework for multi-camera face orientation analysis

  • Authors:
  • Chung-Ching Chang;Hamid Aghajan

  • Affiliations:
  • Wireless Sensor Networks Lab, Stanford University, Stanford, CA;Wireless Sensor Networks Lab, Stanford University, Stanford, CA

  • Venue:
  • ACIVS'07 Proceedings of the 9th international conference on Advanced concepts for intelligent vision systems
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we propose a collaborative technique for face orientation estimation in smart camera networks. The proposed spatiotemporal feature fusion analysis is based on active collaboration between the cameras in data fusion and decision making using features extracted by each camera. First, a head strip mapping method is proposed based on a Markov model and a Viterbi-like algorithm to estimate the relative angular differences to the face between the cameras. Then, given synchronized face sequences from several camera nodes, the proposed technique determines the orientation and the angular motion of the face using two features, namely the hair-face ratio and the head optical flow. These features yield an estimate of the face orientation and the angular velocity through simple analysis such as Discrete Fourier Transform (DFT) and Least Squares (LS), respectively. Spatiotemporal feature fusion is implemented via key frame detection in each camera, a forward-backward probabilistic model, and a spatiotemporal validation scheme. The key frames are obtained when a camera node detects a frontal face view and are exchanged between the cameras so that local face orientation estimates can be adjusted to maintain a high confidence level. The forward-backward probabilistic model aims to mitigate error propagation in time. Finally, a spatiotemporal validation scheme is applied for spatial outlier removal and temporal smoothing. A face view is interpolated from the mapped head strips, from which snapshots at the desired view angles can be generated. The proposed technique does not require camera locations to be known in prior, and hence is applicable to vision networks deployed casually without localization.