Discriminative Models for Multi-Class Object Layout

Authors:
Chaitanya Desai;Deva Ramanan;Charless C. Fowlkes
Affiliations:
Department of Computer Science, UC Irvine, Irvine, USA;Department of Computer Science, UC Irvine, Irvine, USA;Department of Computer Science, UC Irvine, Irvine, USA
Venue:
International Journal of Computer Vision
Year:
2011

Citing 19
Cited 3

Neural Network-Based Face Detection

IEEE Transactions on Pattern Analysis and Machine Intelligence
Robust Real-Time Face Detection

International Journal of Computer Vision
Support vector machine learning for interdependent and structured output spaces

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Near-regular texture analysis and manipulation

ACM SIGGRAPH 2004 Papers
Histograms of Oriented Gradients for Human Detection

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 1 - Volume 01
Discriminative Learning of Markov Random Fields for Segmentation of 3D Scan Data

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 2 - Volume 02
A Hierarchical Field Framework for Unified Context-Based Classification

ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision - Volume 2
Learning Hierarchical Models of Scenes, Objects, and Parts

ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision - Volume 2
Convergent Tree-Reweighted Message Passing for Energy Minimization

IEEE Transactions on Pattern Analysis and Machine Intelligence
Linear Programming Relaxations and Belief Propagation -- An Empirical Study

The Journal of Machine Learning Research
A scalable modular convex solver for regularized risk minimization

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Training structural SVMs when exact inference is intractable

Proceedings of the 25th international conference on Machine learning
Putting Objects in Perspective

International Journal of Computer Vision
Learning to Localize Objects with Structured Output Regression

ECCV '08 Proceedings of the 10th European Conference on Computer Vision: Part I
Cutting-plane training of structural SVMs

Machine Learning
Multiresolution models for object detection

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part IV
Multiscale conditional random fields for image labeling

CVPR'04 Proceedings of the 2004 IEEE computer society conference on Computer vision and pattern recognition
TextonBoost: joint appearance, shape and context modeling for multi-class object recognition and segmentation

ECCV'06 Proceedings of the 9th European conference on Computer Vision - Volume Part I
MAP estimation via agreement on trees: message-passing and linear programming

IEEE Transactions on Information Theory

Hough regions for joining instance localization and segmentation

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part III
Learning a context aware dictionary for sparse representation

ACCV'12 Proceedings of the 11th Asian conference on Computer Vision - Volume Part II
Visual object detection with deformable part models

Communications of the ACM

Quantified Score

Hi-index	0.02

Visualization

Abstract

Many state-of-the-art approaches for object recognition reduce the problem to a 0-1 classification task. This allows one to leverage sophisticated machine learning techniques for training classifiers from labeled examples. However, these models are typically trained independently for each class using positive and negative examples cropped from images. At test-time, various post-processing heuristics such as non-maxima suppression (NMS) are required to reconcile multiple detections within and between different classes for each image. Though crucial to good performance on benchmarks, this post-processing is usually defined heuristically.We introduce a unified model for multi-class object recognition that casts the problem as a structured prediction task. Rather than predicting a binary label for each image window independently, our model simultaneously predicts a structured labeling of the entire image (Fig. 1). Our model learns statistics that capture the spatial arrangements of various object classes in real images, both in terms of which arrangements to suppress through NMS and which arrangements to favor through spatial co-occurrence statistics.We formulate parameter estimation in our model as a max-margin learning problem. Given training images with ground-truth object locations, we show how to formulate learning as a convex optimization problem. We employ the cutting plane algorithm of Joachims et al. (Mach. Learn. 2009) to efficiently learn a model from thousands of training images. We show state-of-the-art results on the PASCAL VOC benchmark that indicate the benefits of learning a global model encapsulating the spatial layout of multiple object classes (a preliminary version of this work appeared in ICCV 2009, Desai et al., IEEE international conference on computer vision, 2009).