Compositional object recognition, segmentation, and tracking in video

  • Authors:
  • Björn Ommer;Joachim M. Buhmann

  • Affiliations:
  • Institute of Computational Science, ETH Zurich, Zurich, Switzerland;Institute of Computational Science, ETH Zurich, Zurich, Switzerland

  • Venue:
  • EMMCVPR'07 Proceedings of the 6th international conference on Energy minimization methods in computer vision and pattern recognition
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

The complexity of visual representations is substantially limited by the compositional nature of our visual world which, therefore, renders learning structured object models feasible. During recognition, such structured models might however be disadvantageous, especially under the high computational demands of video. This contribution presents a compositional approach to video analysis that demonstrates the value of compositionality for both, learning of structured object models and recognition in near real-time. We unite category-level, multi-class object recognition, segmentation, and tracking in the same probabilistic graphical model. A model selection strategy is pursued to facilitate recognition and tracking of multiple objects that appear simultaneously in a video. Object models are learned from videos with heavy clutter and camera motion where only an overall category label for a training video is provided, but no hand-segmentation or localization of objects is required. For evaluation purposes a video categorization database is assembled and experiments convincingly demonstrate the suitability of the approach.