Learning what is where from unlabeled images: joint localization and clustering of foreground objects

Authors:
Ashok Chandrashekar;Lorenzo Torresani;Richard Granger
Affiliations:
Dartmouth College, Hanover, USA;Dartmouth College, Hanover, USA;Dartmouth College, Hanover, USA
Venue:
Machine Learning
Year:
2014

Citing 18
Cited 0

Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary

ECCV '02 Proceedings of the 7th European Conference on Computer Vision-Part IV
Latent dirichlet allocation

The Journal of Machine Learning Research
Distinctive Image Features from Scale-Invariant Keypoints

International Journal of Computer Vision
Image Categorization by Learning and Reasoning with Regions

The Journal of Machine Learning Research
Combining Top-Down and Bottom-Up Segmentation

CVPRW '04 Proceedings of the 2004 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'04) Volume 4 - Volume 04
Image Parsing: Unifying Segmentation, Detection, and Recognition

International Journal of Computer Vision
Histograms of Oriented Gradients for Human Detection

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 1 - Volume 01
Vision: A Computational Investigation into the Human Representation and Processing of Visual Information

Vision: A Computational Investigation into the Human Representation and Processing of Visual Information
Learning Hierarchical Models of Scenes, Objects, and Parts

ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision - Volume 2
Learning Object Categories from Google"s Image Search

ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision - Volume 2
Cosegmentation of Image Pairs by Histogram Matching - Incorporating a Global Constraint into MRFs

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 1
Using Multiple Segmentations to Discover Objects and their Extent in Image Collections

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
k-means++: the advantages of careful seeding

SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
Foreground Focus: Unsupervised Learning from Partially Matching Images

International Journal of Computer Vision
Unsupervised Object Discovery: A Comparison

International Journal of Computer Vision
Localizing objects while learning their appearance

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part IV
ClassCut for unsupervised class segmentation

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part V
SIFT Flow: Dense Correspondence across Scenes and Its Applications

IEEE Transactions on Pattern Analysis and Machine Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

"What does it mean, to see? The plain man's answer would be, to know what is where by looking." This famous quote by David Marr (Vision: A Computational Investigation into the Human Representation and Processing of Visual Information, Freeman, New York, 1982) sums up the holy grail of vision: discovering what is present in the world, and where it is, from unlabeled images. In this paper we tackle this challenging problem by proposing a generative model of object formation and describe an efficient algorithm to automatically learn the parameters of the model from a collection of unlabeled images. Our algorithm discovers the objects and their spatial extents by clustering together images containing similar foregrounds. Our approach simultaneously solves for the image clusters, the foreground appearance models and the spatial regions containing the objects by optimizing a single likelihood function defined over the entire image collection. We describe two methods for efficient foreground localization: the first method does not require any bottom-up image segmentation and discovers the foreground region as a contiguous rectangular bounding box. The second method expresses the foreground as a collection of super-pixels generated through a bottom-up segmentation of the image. However, unlike previous methods, objects are not assumed to be encapsulated by a single segment. Evaluation on standard benchmarks and comparison with prior methods demonstrate that our approach achieves state-of-the-art results on the problem of unsupervised foreground localization and clustering.