Learning discriminative localization from weakly labeled data

Authors:
Minh Hoai;Lorenzo Torresani;Fernando De La Torre;Carsten Rother
Affiliations:
-;-;-;-
Venue:
Pattern Recognition
Year:
2014

Citing 36
Cited 0

Multiple-Instance Learning for Natural Scene Classification

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Image Database Retrieval with Multiple-Instance Learning Techniques

ICDE '00 Proceedings of the 16th International Conference on Data Engineering
Utilizing Scatter for Pixel Subspace Selection

ICCV '99 Proceedings of the International Conference on Computer Vision-Volume 2 - Volume 2
On the algorithmic implementation of multiclass kernel-based vector machines

The Journal of Machine Learning Research
Space-time Interest Points

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Video Google: A Text Retrieval Approach to Object Matching in Videos

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Robust Real-Time Face Detection

International Journal of Computer Vision
Distinctive Image Features from Scale-Invariant Keypoints

International Journal of Computer Vision
Image Categorization by Learning and Reasoning with Regions

The Journal of Machine Learning Research
Learning to Detect Objects in Images via a Sparse, Part-Based Representation

IEEE Transactions on Pattern Analysis and Machine Intelligence
Combining Top-Down and Bottom-Up Segmentation

CVPRW '04 Proceedings of the 2004 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'04) Volume 4 - Volume 04
Image Parsing: Unifying Segmentation, Detection, and Recognition

International Journal of Computer Vision
Histograms of Oriented Gradients for Human Detection

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 1 - Volume 01
Hybrid Models for Human Motion Recognition

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 1 - Volume 01
Large Margin Methods for Structured and Interdependent Output Variables

The Journal of Machine Learning Research
Extracting Subimages of an Unknown Category from a Set of Images

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 1
Cosegmentation of Image Pairs by Histogram Matching - Incorporating a Global Constraint into MRFs

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 1
Scalable Recognition with a Vocabulary Tree

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Using Multiple Segmentations to Discover Objects and their Extent in Image Collections

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Incorporating multiple SVMs for automatic image annotation

Pattern Recognition
Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study

International Journal of Computer Vision
Combining local belief from low-level primitives for perceptual grouping

Pattern Recognition
Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words

International Journal of Computer Vision
Learning to Localize Objects with Structured Output Regression

ECCV '08 Proceedings of the 10th European Conference on Computer Vision: Part I
Weakly Supervised Object Localization with Stable Segmentations

ECCV '08 Proceedings of the 10th European Conference on Computer Vision: Part I
Learning structural SVMs with latent variables

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
A Convex Method for Locating Regions of Interest with Multi-instance Learning

ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part II
Optimal feature selection for support vector machines

Pattern Recognition
Evaluating multi-class multiple-instance learning for image categorization

ACCV'07 Proceedings of the 8th Asian conference on Computer vision - Volume Part II
Cosegmentation revisited: models and optimization

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part II
Localizing objects while learning their appearance

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part IV
Detecting unusual activity in video

CVPR'04 Proceedings of the 2004 IEEE computer society conference on Computer vision and pattern recognition
MILIS: Multiple Instance Learning with Instance Selection

IEEE Transactions on Pattern Analysis and Machine Intelligence
Joint segmentation and classification of human actions in video

CVPR '11 Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition
Bootstrapping Boosted Random Ferns for discriminative and efficient object classification

Pattern Recognition
In defence of negative mining for annotating weakly labelled data

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part III

Quantified Score

Hi-index	0.01

Visualization

Abstract

Visual categorization problems, such as object classification or action recognition, are increasingly often approached using a detection strategy: a classifier function is first applied to candidate subwindows of the image or the video, and then the maximum classifier score is used for class decision. Traditionally, the subwindow classifiers are trained on a large collection of examples manually annotated with masks or bounding boxes. The reliance on time-consuming human labeling effectively limits the application of these methods to problems involving very few categories. Furthermore, the human selection of the masks introduces arbitrary biases (e.g., in terms of window size and location) which may be suboptimal for classification. We propose a novel method for learning a discriminative subwindow classifier from examples annotated with binary labels indicating the presence of an object or action of interest, but not its location. During training, our approach simultaneously localizes the instances of the positive class and learns a subwindow SVM to recognize them. We extend our method to classification of time series by presenting an algorithm that localizes the most discriminative set of temporal segments in the signal. We evaluate our approach on several datasets for object and action recognition and show that it achieves results similar and in many cases superior to those obtained with full supervision.