Visual dictionary learning for joint object categorization and segmentation

  • Authors:
  • Aastha Jain;Luca Zappella;Patrick McClure;René Vidal

  • Affiliations:
  • Center for Imaging Science, Johns Hopkins University;Center for Imaging Science, Johns Hopkins University;Center for Imaging Science, Johns Hopkins University;Center for Imaging Science, Johns Hopkins University

  • Venue:
  • ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part V
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Representing objects using elements from a visual dictionary is widely used in object detection and categorization. Prior work on dictionary learning has shown improvements in the accuracy of object detection and categorization by learning discriminative dictionaries. However none of these dictionaries are learnt for joint object categorization and segmentation. Moreover, dictionary learning is often done separately from classifier training, which reduces the discriminative power of the model. In this paper, we formulate the semantic segmentation problem as a joint categorization, segmentation and dictionary learning problem. To that end, we propose a latent conditional random field (CRF) model in which the observed variables are pixel category labels and the latent variables are visual word assignments. The CRF energy consists of a bottom-up segmentation cost, a top-down bag of (latent) words categorization cost, and a dictionary learning cost. Together, these costs capture relationships between image features and visual words, relationships between visual words and object categories, and spatial relationships among visual words. The segmentation, categorization, and dictionary learning parameters are learnt jointly using latent structural SVMs, and the segmentation and visual words assignments are inferred jointly using energy minimization techniques. Experiments on the Graz02 and CamVid datasets demonstrate the performance of our approach.