Refining image annotation using contextual relations between words

Authors:
Yong Wang;Shaogang Gong
Affiliations:
University of London, London, UK;University of London, London, UK
Venue:
Proceedings of the 6th ACM international conference on Image and video retrieval
Year:
2007

Citing 16
Cited 7

WordNet: a lexical database for English

Communications of the ACM
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Automatic image annotation and retrieval using cross-media relevance models

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Modeling annotated data

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Automatic Linguistic Indexing of Pictures by a Statistical Modeling Approach

IEEE Transactions on Pattern Analysis and Machine Intelligence
Sparse bayesian learning and the relevance vector machine

The Journal of Machine Learning Research
Matching words and pictures

The Journal of Machine Learning Research
Distinctive Image Features from Scale-Invariant Keypoints

International Journal of Computer Vision
Formulating Semantic Image Annotation as a Supervised Learning Problem

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 2 - Volume 02
Evaluating the impact of selection noise in community-based web search

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Image annotations by combining multiple evidence & wordNet

Proceedings of the 13th annual ACM international conference on Multimedia
Multimodal metadata fusion using causal strength

Proceedings of the 13th annual ACM international conference on Multimedia
Incorporating concept ontology to enable probabilistic concept reasoning for multi-level image annotation

MIR '06 Proceedings of the 8th ACM international workshop on Multimedia information retrieval
Image annotation refinement using random walk with restarts

MULTIMEDIA '06 Proceedings of the 14th annual ACM international conference on Multimedia
The Google Similarity Distance

IEEE Transactions on Knowledge and Data Engineering
CBSA: content-based soft annotation for multimodal image retrieval using Bayes point machines

IEEE Transactions on Circuits and Systems for Video Technology

Knowledge Based Image Annotation Refinement

Journal of Signal Processing Systems
Semantic relationships in multi-modal graphs for automatic image annotation

ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval
Discovering phrase-level lexicon for image annotation

PCM'10 Proceedings of the 11th Pacific Rim conference on Advances in multimedia information processing: Part I
Training inter-related classifiers for automatic image classification and annotation

Pattern Recognition
A feature-word-topic model for image annotation and retrieval

ACM Transactions on the Web (TWEB)
Learning group-based dictionaries for discriminative image representation

Pattern Recognition
Semantic context based refinement for news video annotation

Multimedia Tools and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we present a probabilistic approach to refine image annotations by incorporating semantic relations between annotation words. Our approach firstly predicts a candidate set of annotation words with confidence scores. This is achieved by the relevance vector machine (RVM), which is a kernel based probabilistic classifier in order to cope with nonlinear classification. Given the candidate annotations, we model semantic relationships between words using a conditional random field (CRF) model where each vertex indicates the final decision (true / false) on a candidate annotation word. The refined annotation is given by inferring the most likely states of these vertexes. In the CRF model, we consider the confidence scores given by the RVM classifiers as local evidences. In addition, we utilise Normalized Google distances (NGD's) between two words as their contextual potential. NGD is a distance function between two words obtained by searching a pair of words using the Google search engine. It has a simple mathematical formulation with a foundation in Kolmogorov theory. We also propose a learning algorithm to tune the weight parameters in the CRF model. These weight parameters control the balance between the local evidence of a single word and the contextual relation between words. Our experiments on the Corel images demonstrate the effect of our approach.