Learning structured prediction models for image labeling

  • Authors:
  • Xuming He

  • Affiliations:
  • University of Toronto (Canada)

  • Venue:
  • Learning structured prediction models for image labeling
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Many fundamental tasks in computational vision can be formulated as predicting unknown properties of a scene from a static image. If the scene property is described by a set of discrete values in each image, then the corresponding vision task is an image labeling problem. A key issue in image labeling concerns how to exploit the context information in images, as local evidence is often insufficient to determine the label value. This thesis takes a statistical learning approach to the labeling problem, focusing on two main issues in incorporating context into the labeling process: (1) what are the efficient representations of contexts for labeling? and (2) how do we learn the context representations for a labeling task from data? We discuss two learning situations based on different degrees of data availability. In the first case, enough fully-labeled data are available for learning. So we develop a discriminative labeling framework based on a Conditional Random Field (CRF), in which multiscale feature functions are proposed to capture the image/label contexts at several spatial scales. Those feature functions affect the labeling from local to global levels: some aspects of the contexts concern co-occurrence of objects in the image, while other aspects concern the geometric relationships between objects. To extend the range of object classes and image database size that the system can handle, we also propose a modular CRF model that integrates the bottom-up image cues and top-down categorical information. The second case has a less strict requirement on the training data, including not only a small number of fully-labeled data, but also a large number of coarsely-labeled ones. We present a hybrid unsupervised-supervised approach that combines a generative topic model with discriminative label classifiers. The topic model is used to model the co-occurring image features for representing image context, and it is extended such that the topics are not only applied to image features but also to labels. We examine the performance of our models on several real-world image databases, and compare our systems to baseline methods.