Unsupervised learning by probabilistic latent semantic analysis
Machine Learning
Automatic image annotation and retrieval using cross-media relevance models
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Automatic Linguistic Indexing of Pictures by a Statistical Modeling Approach
IEEE Transactions on Pattern Analysis and Machine Intelligence
The Journal of Machine Learning Research
Distinctive Image Features from Scale-Invariant Keypoints
International Journal of Computer Vision
Multiple Bernoulli relevance models for image and video annotation
CVPR'04 Proceedings of the 2004 IEEE computer society conference on Computer vision and pattern recognition
CBSA: content-based soft annotation for multimodal image retrieval using Bayes point machines
IEEE Transactions on Circuits and Systems for Video Technology
Context dependent SVMs for interconnected image network annotation
Proceedings of the international conference on Multimedia
Context-based support vector machines for interconnected image annotation
ACCV'10 Proceedings of the 10th Asian conference on Computer vision - Volume Part I
Hi-index | 0.00 |
One of the classic techniques for image annotation is the language translation model. It views an image as a document, i.e., a set of visual words which are obtained by vector quatitizing the image regions generated by unsupervised image segmentation. Annotating images are achieved by translating visual words to textual words, just like translating a document in English to a document in French. In this paper, we also view an image as a document, but we view the annotation processes as two consecutive processes, i.e., document summarization and translation. In the document summarization process, an image document is firstly summarized into its own visual language, which we called visual topics. The translation process translates these visual topics to textual words. Compared to the original translation model, our visual topics learned by the probabilistic latent semantic analysis (PLSA) approach provide an intermediate abstract level of visual description. We show improved annotation performance on the Corel image dataset.