Learning semantic object parts for object categorization

Authors:
Bastian Leibe;Alan Ettlin;Bernt Schiele
Affiliations:
Computer Vision Laboratory, ETH, Zurich, Switzerland;Department of Computer Science, TU, Darmstadt, Germany;Department of Computer Science, TU, Darmstadt, Germany
Venue:
Image and Vision Computing
Year:
2008

Citing 15
Cited 6

Control of selective perception using Bayes nets and decision theory

International Journal of Computer Vision - Special issue on active vision II
Visual learning and recognition of 3-D objects from appearance

International Journal of Computer Vision
On Combining Classifiers

IEEE Transactions on Pattern Analysis and Machine Intelligence
Recognition without Correspondence using MultidimensionalReceptive Field Histograms

International Journal of Computer Vision
A Trainable System for Object Detection

International Journal of Computer Vision - special issue on learning and vision at the center for biological and computational learning, Massachusetts Institute of Technology
Example-Based Object Detection in Images by Components

IEEE Transactions on Pattern Analysis and Machine Intelligence
Mean Shift, Mode Seeking, and Clustering

IEEE Transactions on Pattern Analysis and Machine Intelligence
Face detection by aggregated Bayesian network classifiers

Pattern Recognition Letters - In memory of Professor E.S. Gelsema
A Probabilistic Approach to Object Recognition Using Local Photometry and Global Geometry

ECCV '98 Proceedings of the 5th European Conference on Computer Vision-Volume II - Volume II
Learning a Sparse Representation for Object Detection

ECCV '02 Proceedings of the 7th European Conference on Computer Vision-Part IV
Body plans

CVPR '97 Proceedings of the 1997 Conference on Computer Vision and Pattern Recognition (CVPR '97)
Combining greyvalue invariants with local constraints for object recognition

CVPR '96 Proceedings of the 1996 Conference on Computer Vision and Pattern Recognition (CVPR '96)
Object Recognition from Local Scale-Invariant Features

ICCV '99 Proceedings of the International Conference on Computer Vision-Volume 2 - Volume 2
A Cubist Approach to Object Recognition

ICCV '98 Proceedings of the Sixth International Conference on Computer Vision
Class-Based Matching of Object Parts

CVPRW '04 Proceedings of the 2004 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'04) Volume 11 - Volume 11

Bag-of-Features Codebook Generation by Self-Organisation

WSOM '09 Proceedings of the 7th International Workshop on Advances in Self-Organizing Maps
Class Representative Visual Words for Category-Level Object Recognition

IbPRIA '09 Proceedings of the 4th Iberian Conference on Pattern Recognition and Image Analysis
Bounded transparency for automated inspection in agriculture

Computers and Electronics in Agriculture
Towards a more discriminative and semantic visual vocabulary

Computer Vision and Image Understanding
Semantic hierarchies for image annotation: A survey

Pattern Recognition
Improvements in image categorization using codebook ensembles

Image and Vision Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Appearance-based approaches to object recognition mostly rely on measuring the visual similarity of objects based on global or local descriptors. They have shown great success in object identification but often do not generalize to the more challenging case of object categorization, where category membership is often decided not only on a level of appearances, but also on a semantic level. It has been argued that model-based approaches are better suited to this problem, since they allow to inject high-level knowledge, for example about the constituting object parts and possible configurations. Postulating a set of object parts is problematic, though, since it is not guaranteed that those parts can be reliably extracted from real-world images. There is a need for a middle layer, forming an interface between the visual information readily available from the image and the higher-level semantic information that can be used by reasoning processes. In this work, we investigate how such an interface can be learned. As the appearance of object parts may vary considerably, this cannot be achieved by relying on visual similarity alone. Rather, this paper proposes to also use co-location and co-activation, together with weak top-down constraints, such as alignment, as guiding principles for learning the appearance of local object parts. The learned structures generalize beyond the appearance of single objects and often correspond to semantically plausible object parts, such as wheels, trunks, or windshields of cars. In a later stage, a Bayesian network of those extracted structures is used to verify object hypotheses successfully in difficult scenes.