Max Margin Learning of Hierarchical Configural Deformable Templates (HCDTs) for Efficient Object Parsing and Pose Estimation

  • Authors:
  • Long (Leo) Zhu;Yuanhao Chen;Chenxi Lin;Alan Yuille

  • Affiliations:
  • Department of Statistics, University of California at Los Angeles, Los Angeles, USA 90095;University of Science and Technology of China, Hefei, P.R. China 230026;Alibaba Group R&D, Hangzhou, P.R. China;Department of Statistics, Psychology and Computer Science, University of California at Los Angeles, Los Angeles, USA 90095

  • Venue:
  • International Journal of Computer Vision
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper we formulate a hierarchical configurable deformable template (HCDT) to model articulated visual objects--such as horses and baseball players--for tasks such as parsing, segmentation, and pose estimation. HCDTs represent an object by an AND/OR graph where the OR nodes act as switches which enables the graph topology to vary adaptively. This hierarchical representation is compositional and the node variables represent positions and properties of subparts of the object. The graph and the node variables are required to obey the summarization principle which enables an efficient compositional inference algorithm to rapidly estimate the state of the HCDT. We specify the structure of the AND/OR graph of the HCDT by hand and learn the model parameters discriminatively by extending Max-Margin learning to AND/OR graphs. We illustrate the three main aspects of HCDTs--representation, inference, and learning--on the tasks of segmenting, parsing, and pose (configuration) estimation for horses and humans. We demonstrate that the inference algorithm is fast and that max-margin learning is effective. We show that HCDTs gives state of the art results for segmentation and pose estimation when compared to other methods on benchmarked datasets.