Annotation propagation in large image databases via dense image correspondence

Authors:
Michael Rubinstein;Ce Liu;William T. Freeman
Affiliations:
MIT CSAIL, USA,Microsoft Research New, England;Microsoft Research New, England;MIT CSAIL
Venue:
ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part III
Year:
2012

Citing 16
Cited 0

Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope

International Journal of Computer Vision
Latent dirichlet allocation

The Journal of Machine Learning Research
Labeling images with a computer game

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
"GrabCut": interactive foreground extraction using iterated graph cuts

ACM SIGGRAPH 2004 Papers
Histograms of Oriented Gradients for Human Detection

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 1 - Volume 01
Extremely randomized trees

Machine Learning
LabelMe: A Database and Web-Based Tool for Image Annotation

International Journal of Computer Vision
VisualRank: Applying PageRank to Large-Scale Image Search

IEEE Transactions on Pattern Analysis and Machine Intelligence
SIFT Flow: Dense Correspondence across Different Scenes

ECCV '08 Proceedings of the 10th European Conference on Computer Vision: Part III
Baselines for Image Annotation

International Journal of Computer Vision
Superparsing: scalable nonparametric image parsing with superpixels

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part V
Multiple Bernoulli relevance models for image and video annotation

CVPR'04 Proceedings of the 2004 IEEE computer society conference on Computer vision and pattern recognition
Cost-Sensitive Active Visual Category Learning

International Journal of Computer Vision
Interactive segmentation with super-labels

EMMCVPR'11 Proceedings of the 8th international conference on Energy minimization methods in computer vision and pattern recognition
Nonparametric Scene Parsing via Label Transfer

IEEE Transactions on Pattern Analysis and Machine Intelligence
TextonBoost: joint appearance, shape and context modeling for multi-class object recognition and segmentation

ECCV'06 Proceedings of the 9th European conference on Computer Vision - Volume Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

Our goal is to automatically annotate many images with a set of word tags and a pixel-wise map showing where each word tag occurs. Most previous approaches rely on a corpus of training images where each pixel is labeled. However, for large image databases, pixel labels are expensive to obtain and are often unavailable. Furthermore, when classifying multiple images, each image is typically solved for independently, which often results in inconsistent annotations across similar images. In this work, we incorporate dense image correspondence into the annotation model, allowing us to make do with significantly less labeled data and to resolve ambiguities by propagating inferred annotations from images with strong local visual evidence to images with weaker local evidence. We establish a large graphical model spanning all labeled and unlabeled images, then solve it to infer annotations, enforcing consistent annotations over similar visual patterns. Our model is optimized by efficient belief propagation algorithms embedded in an expectation-maximization (EM) scheme. Extensive experiments are conducted to evaluate the performance on several standard large-scale image datasets, showing that the proposed framework outperforms state-of-the-art methods.