Translating topics to words for image annotation

Authors:
Yong Wang;Shaogang Gong
Affiliations:
Queen Mary: University of London, London, United Kingdom;Queen Mary: University of London, London, United Kingdom
Venue:
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Year:
2007

Citing 7
Cited 2

Unsupervised learning by probabilistic latent semantic analysis

Machine Learning
Automatic image annotation and retrieval using cross-media relevance models

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Automatic Linguistic Indexing of Pictures by a Statistical Modeling Approach

IEEE Transactions on Pattern Analysis and Machine Intelligence
Matching words and pictures

The Journal of Machine Learning Research
Distinctive Image Features from Scale-Invariant Keypoints

International Journal of Computer Vision
Multiple Bernoulli relevance models for image and video annotation

CVPR'04 Proceedings of the 2004 IEEE computer society conference on Computer vision and pattern recognition
CBSA: content-based soft annotation for multimodal image retrieval using Bayes point machines

IEEE Transactions on Circuits and Systems for Video Technology

Context dependent SVMs for interconnected image network annotation

Proceedings of the international conference on Multimedia
Context-based support vector machines for interconnected image annotation

ACCV'10 Proceedings of the 10th Asian conference on Computer vision - Volume Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

One of the classic techniques for image annotation is the language translation model. It views an image as a document, i.e., a set of visual words which are obtained by vector quatitizing the image regions generated by unsupervised image segmentation. Annotating images are achieved by translating visual words to textual words, just like translating a document in English to a document in French. In this paper, we also view an image as a document, but we view the annotation processes as two consecutive processes, i.e., document summarization and translation. In the document summarization process, an image document is firstly summarized into its own visual language, which we called visual topics. The translation process translates these visual topics to textual words. Compared to the original translation model, our visual topics learned by the probabilistic latent semantic analysis (PLSA) approach provide an intermediate abstract level of visual description. We show improved annotation performance on the Corel image dataset.