Image annotation using metric learning in semantic neighbourhoods

Authors:
Yashaswi Verma;C. V. Jawahar
Affiliations:
International Institute of Information Technology, Hyderabad, India;International Institute of Information Technology, Hyderabad, India
Venue:
ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part III
Year:
2012

Citing 9
Cited 2

Discriminant Adaptive Nearest Neighbor Classification

IEEE Transactions on Pattern Analysis and Machine Intelligence
Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary

ECCV '02 Proceedings of the 7th European Conference on Computer Vision-Part IV
Labeling images with a computer game

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
SVM-KNN: Discriminative Nearest Neighbor Classification for Visual Category Recognition

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Supervised Learning of Semantic Classes for Image Annotation and Retrieval

IEEE Transactions on Pattern Analysis and Machine Intelligence
A New Baseline for Image Annotation

ECCV '08 Proceedings of the 10th European Conference on Computer Vision: Part III
Distance Metric Learning for Large Margin Nearest Neighbor Classification

The Journal of Machine Learning Research
Multiple Bernoulli relevance models for image and video annotation

CVPR'04 Proceedings of the 2004 IEEE computer society conference on Computer vision and pattern recognition
Multi-label learning with incomplete class assignments

CVPR '11 Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition

Image annotation using high order statistics in non-Euclidean spaces

Journal of Visual Communication and Image Representation
A Multi-View Embedding Space for Modeling Internet Images, Tags, and Their Semantics

International Journal of Computer Vision

Quantified Score

Hi-index	0.00

Visualization

Abstract

Automatic image annotation aims at predicting a set of textual labels for an image that describe its semantics. These are usually taken from an annotation vocabulary of few hundred labels. Because of the large vocabulary, there is a high variance in the number of images corresponding to different labels ("class-imbalance"). Additionally, due to the limitations of manual annotation, a significant number of available images are not annotated with all the relevant labels ("weak-labelling"). These two issues badly affect the performance of most of the existing image annotation models. In this work, we propose 2PKNN, a two-step variant of the classical K-nearest neighbour algorithm, that addresses these two issues in the image annotation task. The first step of 2PKNN uses "image-to-label" similarities, while the second step uses "image-to-image" similarities; thus combining the benefits of both. Since the performance of nearest-neighbour based methods greatly depends on how features are compared, we also propose a metric learning framework over 2PKNN that learns weights for multiple features as well as distances together. This is done in a large margin set-up by generalizing a well-known (single-label) classification metric learning algorithm for multi-label prediction. For scalability, we implement it by alternating between stochastic sub-gradient descent and projection steps. Extensive experiments demonstrate that, though conceptually simple, 2PKNN alone performs comparable to the current state-of-the-art on three challenging image annotation datasets, and shows significant improvements after metric learning.