Modeling inter-camera space-time and appearance relationships for tracking across non-overlapping views

  • Authors:
  • Omar Javed;Khurram Shafique;Zeeshan Rasheed;Mubarak Shah

  • Affiliations:
  • Object Video, 11600 Sunrise Valley Dr. Reston, VA 20171, USA;University of Central Florida, Orlando, FL 32816, USA;Object Video, 11600 Sunrise Valley Dr. Reston, VA 20171, USA;University of Central Florida, Orlando, FL 32816, USA

  • Venue:
  • Computer Vision and Image Understanding
  • Year:
  • 2008

Quantified Score

Hi-index 0.02

Visualization

Abstract

Tracking across cameras with non-overlapping views is a challenging problem. Firstly, the observations of an object are often widely separated in time and space when viewed from non-overlapping cameras. Secondly, the appearance of an object in one camera view might be very different from its appearance in another camera view due to the differences in illumination, pose and camera properties. To deal with the first problem, we observe that people or vehicles tend to follow the same paths in most cases, i.e., roads, walkways, corridors etc. The proposed algorithm uses this conformity in the traversed paths to establish correspondence. The algorithm learns this conformity and hence the inter-camera relationships in the form of multivariate probability density of space-time variables (entry and exit locations, velocities, and transition times) using kernel density estimation. To handle the appearance change of an object as it moves from one camera to another, we show that all brightness transfer functions from a given camera to another camera lie in a low dimensional subspace. This subspace is learned by using probabilistic principal component analysis and used for appearance matching. The proposed approach does not require explicit inter-camera calibration, rather the system learns the camera topology and subspace of inter-camera brightness transfer functions during a training phase. Once the training is complete, correspondences are assigned using the maximum likelihood (ML) estimation framework using both location and appearance cues. Experiments with real world videos are reported which validate the proposed approach.