Neural Network-Based Face Detection
IEEE Transactions on Pattern Analysis and Machine Intelligence
Robust Real-Time Face Detection
International Journal of Computer Vision
Support vector machine learning for interdependent and structured output spaces
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Near-regular texture analysis and manipulation
ACM SIGGRAPH 2004 Papers
Histograms of Oriented Gradients for Human Detection
CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 1 - Volume 01
Discriminative Learning of Markov Random Fields for Segmentation of 3D Scan Data
CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 2 - Volume 02
A Hierarchical Field Framework for Unified Context-Based Classification
ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision - Volume 2
Learning Hierarchical Models of Scenes, Objects, and Parts
ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision - Volume 2
Convergent Tree-Reweighted Message Passing for Energy Minimization
IEEE Transactions on Pattern Analysis and Machine Intelligence
Linear Programming Relaxations and Belief Propagation -- An Empirical Study
The Journal of Machine Learning Research
A scalable modular convex solver for regularized risk minimization
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Training structural SVMs when exact inference is intractable
Proceedings of the 25th international conference on Machine learning
Putting Objects in Perspective
International Journal of Computer Vision
Learning to Localize Objects with Structured Output Regression
ECCV '08 Proceedings of the 10th European Conference on Computer Vision: Part I
Cutting-plane training of structural SVMs
Machine Learning
Multiresolution models for object detection
ECCV'10 Proceedings of the 11th European conference on Computer vision: Part IV
Multiscale conditional random fields for image labeling
CVPR'04 Proceedings of the 2004 IEEE computer society conference on Computer vision and pattern recognition
ECCV'06 Proceedings of the 9th European conference on Computer Vision - Volume Part I
MAP estimation via agreement on trees: message-passing and linear programming
IEEE Transactions on Information Theory
Hough regions for joining instance localization and segmentation
ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part III
Learning a context aware dictionary for sparse representation
ACCV'12 Proceedings of the 11th Asian conference on Computer Vision - Volume Part II
Visual object detection with deformable part models
Communications of the ACM
Hi-index | 0.02 |
Many state-of-the-art approaches for object recognition reduce the problem to a 0-1 classification task. This allows one to leverage sophisticated machine learning techniques for training classifiers from labeled examples. However, these models are typically trained independently for each class using positive and negative examples cropped from images. At test-time, various post-processing heuristics such as non-maxima suppression (NMS) are required to reconcile multiple detections within and between different classes for each image. Though crucial to good performance on benchmarks, this post-processing is usually defined heuristically.We introduce a unified model for multi-class object recognition that casts the problem as a structured prediction task. Rather than predicting a binary label for each image window independently, our model simultaneously predicts a structured labeling of the entire image (Fig. 1). Our model learns statistics that capture the spatial arrangements of various object classes in real images, both in terms of which arrangements to suppress through NMS and which arrangements to favor through spatial co-occurrence statistics.We formulate parameter estimation in our model as a max-margin learning problem. Given training images with ground-truth object locations, we show how to formulate learning as a convex optimization problem. We employ the cutting plane algorithm of Joachims et al. (Mach. Learn. 2009) to efficiently learn a model from thousands of training images. We show state-of-the-art results on the PASCAL VOC benchmark that indicate the benefits of learning a global model encapsulating the spatial layout of multiple object classes (a preliminary version of this work appeared in ICCV 2009, Desai et al., IEEE international conference on computer vision, 2009).