Learning the Compositional Nature of Visual Object Categories for Recognition

Authors:
Bjorn Ommer;Joachim Buhmann
Affiliations:
University of California at Berkeley, Berkeley;ETH Zurich, Zurich
Venue:
IEEE Transactions on Pattern Analysis and Machine Intelligence
Year:
2010

Citing 0
Cited 11

A compositional technique for hand posture recognition: new results

WSEAS TRANSACTIONS on COMMUNICATIONS
Discovering multipart appearance models from captioned images

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part V
Voting by grouping dependent parts

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part V
A Bayesian approach for scene interpretation with integrated hierarchical structure

DAGM'11 Proceedings of the 33rd international conference on Pattern recognition
A probabilistic model for component-based shape synthesis

ACM Transactions on Graphics (TOG) - SIGGRAPH 2012 Conference Proceedings
Discovering hierarchical object models from captioned images

Computer Vision and Image Understanding
Learning a generative model of images by factoring appearance and shape

Neural Computation
On Taxonomies for Multi-class Image Categorization

International Journal of Computer Vision
Learning the compositional structure of man-made objects for 3D shape retrieval

EG 3DOR'10 Proceedings of the 3rd Eurographics conference on 3D Object Retrieval
From meaningful contours to discriminative object shape

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part I
Biological models for active vision: towards a unified architecture

ICVS'13 Proceedings of the 9th international conference on Computer Vision Systems

Quantified Score

Hi-index	0.14

Visualization

Abstract

Real-world scene understanding requires recognizing object categories in novel visual scenes. This paper describes a composition system that automatically learns structured, hierarchical object representations in an unsupervised manner without requiring manual segmentation or manual object localization. A central concept for learning object models in the challenging, general case of unconstrained scenes, large intraclass variations, large numbers of categories, and lacking supervision information is to exploit the compositional nature of our (visual) world. The compositional nature of visual objects significantly limits their representation complexity and renders learning of structured object models statistically and computationally tractable. We propose a robust descriptor for local image parts and show how characteristic compositions of parts can be learned that are based on an unspecific part vocabulary shared between all categories. Moreover, a Bayesian network is presented that comprises all the compositional constituents together with scene context and object shape. Object recognition is then formulated as a statistical inference problem in this probabilistic model.