Combining image-level and segment-level models for automatic annotation

Authors:
Daniel Kuettel;Matthieu Guillaumin;Vittorio Ferrari
Affiliations:
Computer Vision Laboratory, ETH Zurich, Switzerland;Computer Vision Laboratory, ETH Zurich, Switzerland;Computer Vision Laboratory, ETH Zurich, Switzerland
Venue:
MMM'12 Proceedings of the 18th international conference on Advances in Multimedia Modeling
Year:
2012

Citing 16
Cited 1

Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary

ECCV '02 Proceedings of the 7th European Conference on Computer Vision-Part IV
Modeling annotated data

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Matching words and pictures

The Journal of Machine Learning Research
Efficient Graph-Based Image Segmentation

International Journal of Computer Vision
PLSA-based image auto-annotation: constraining the latent space

Proceedings of the 12th annual ACM international conference on Multimedia
SVM-KNN: Discriminative Nearest Neighbor Classification for Visual Category Recognition

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Exploiting spatial context constraints for automatic image region annotation

Proceedings of the 15th international conference on Multimedia
Evaluation of Localized Semantics: Data, Methodology, and Experiments

International Journal of Computer Vision
A Discriminative Kernel-Based Approach to Rank Images from Text Queries

IEEE Transactions on Pattern Analysis and Machine Intelligence
Image annotation via graph learning

Pattern Recognition
A New Baseline for Image Annotation

ECCV '08 Proceedings of the 10th European Conference on Computer Vision: Part III
Label to region by bi-layer sparsity priors

MM '09 Proceedings of the 17th ACM international conference on Multimedia
Energy minimization under constraints on label counts

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part II
Multiple Bernoulli relevance models for image and video annotation

CVPR'04 Proceedings of the 2004 IEEE computer society conference on Computer vision and pattern recognition
TextonBoost: joint appearance, shape and context modeling for multi-class object recognition and segmentation

ECCV'06 Proceedings of the 9th European conference on Computer Vision - Volume Part I
Coloring local feature extraction

ECCV'06 Proceedings of the 9th European conference on Computer Vision - Volume Part II

QuMinS: Fast and scalable querying, mining and summarizing multi-modal databases

Information Sciences: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

For the task of assigning labels to an image to summarize its contents, many early attempts use segment-level information and try to determine which parts of the images correspond to which labels. Best performing methods use global image similarity and nearest neighbor techniques to transfer labels from training images to test images. However, global methods cannot localize the labels in the images, unlike segment-level methods. Also, they cannot take advantage of training images that are only locally similar to a test image. We propose several ways to combine recent image-level and segment-level techniques to predict both image and segment labels jointly. We cast our experimental study in an unified framework for both image-level and segment-level annotation tasks. On three challenging datasets, our joint prediction of image and segment labels outperforms either prediction alone on both tasks. This confirms that the two levels offer complementary information.