Robust Visual Tracking by Integrating Multiple Cues Based on Co-Inference Learning

  • Authors:
  • Ying Wu;Thomas S. Huang

  • Affiliations:
  • Department of Electrical & Computer Engineering, Northwestern University, 2145 Sheridan Road, Evanston, IL 60208, USA. yingwu@ece.northwestern.edu;Beckman Institute, University of Illinois at Urbana-Champaign, 405 N. Mathews, Urbana, IL 61801, USA. huang@ifp.uiuc.edu

  • Venue:
  • International Journal of Computer Vision - Special Issue on Computer Vision Research at the Beckman Institute of Advanced Science and Technology
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

Visual tracking can be treated as a parameter estimation problem that infers target states based on image observations from video sequences. A richer target representation may incur better chances of successful tracking in cluttered and dynamic environments, and thus enhance the robustness. Richer representations can be constructed by either specifying a detailed model of a single cue or combining a set of rough models of multiple cues. Both approaches increase the dimensionality of the state space, which results in a dramatic increase of computation. To investigate the integration of rough models from multiple cues and to explore computationally efficient algorithms, this paper formulates the problem of multiple cue integration and tracking in a probabilistic framework based on a factorized graphical model. Structured variational analysis of such a graphical model factorizes different modalities and suggests a co-inference process among these modalities. Based on the importance sampling technique, a sequential Monte Carlo algorithm is proposed to provide an efficient simulation and approximation of the co-inferencing of multiple cues. This algorithm runs in real-time at around 30 Hz. Our extensive experiments show that the proposed algorithm performs robustly in a large variety of tracking scenarios. The approach presented in this paper has the potential to solve other problems including sensor fusion problems.