Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope
International Journal of Computer Vision
The Journal of Machine Learning Research
Labeling images with a computer game
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
"GrabCut": interactive foreground extraction using iterated graph cuts
ACM SIGGRAPH 2004 Papers
Histograms of Oriented Gradients for Human Detection
CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 1 - Volume 01
Machine Learning
LabelMe: A Database and Web-Based Tool for Image Annotation
International Journal of Computer Vision
VisualRank: Applying PageRank to Large-Scale Image Search
IEEE Transactions on Pattern Analysis and Machine Intelligence
SIFT Flow: Dense Correspondence across Different Scenes
ECCV '08 Proceedings of the 10th European Conference on Computer Vision: Part III
Baselines for Image Annotation
International Journal of Computer Vision
Superparsing: scalable nonparametric image parsing with superpixels
ECCV'10 Proceedings of the 11th European conference on Computer vision: Part V
Multiple Bernoulli relevance models for image and video annotation
CVPR'04 Proceedings of the 2004 IEEE computer society conference on Computer vision and pattern recognition
Cost-Sensitive Active Visual Category Learning
International Journal of Computer Vision
Interactive segmentation with super-labels
EMMCVPR'11 Proceedings of the 8th international conference on Energy minimization methods in computer vision and pattern recognition
Nonparametric Scene Parsing via Label Transfer
IEEE Transactions on Pattern Analysis and Machine Intelligence
ECCV'06 Proceedings of the 9th European conference on Computer Vision - Volume Part I
Hi-index | 0.00 |
Our goal is to automatically annotate many images with a set of word tags and a pixel-wise map showing where each word tag occurs. Most previous approaches rely on a corpus of training images where each pixel is labeled. However, for large image databases, pixel labels are expensive to obtain and are often unavailable. Furthermore, when classifying multiple images, each image is typically solved for independently, which often results in inconsistent annotations across similar images. In this work, we incorporate dense image correspondence into the annotation model, allowing us to make do with significantly less labeled data and to resolve ambiguities by propagating inferred annotations from images with strong local visual evidence to images with weaker local evidence. We establish a large graphical model spanning all labeled and unlabeled images, then solve it to infer annotations, enforcing consistent annotations over similar visual patterns. Our model is optimized by efficient belief propagation algorithms embedded in an expectation-maximization (EM) scheme. Extensive experiments are conducted to evaluate the performance on several standard large-scale image datasets, showing that the proposed framework outperforms state-of-the-art methods.