Canonical contextual distance for large-scale image annotation and retrieval

Authors:
Hideki Nakayama;Tatsuya Harada;Yasuo Kuniyoshi
Affiliations:
The University of Tokyo, Tokyo, Japan;The University of Tokyo, Tokyo, Japan;The University of Tokyo, Tokyo, Japan
Venue:
LS-MMRM '09 Proceedings of the First ACM workshop on Large-scale multimedia retrieval and mining
Year:
2009

Citing 21
Cited 0

WordNet: a lexical database for English

Communications of the ACM
Unsupervised learning by probabilistic latent semantic analysis

Machine Learning
Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary

ECCV '02 Proceedings of the 7th European Conference on Computer Vision-Part IV
Latent dirichlet allocation

The Journal of Machine Learning Research
Generic image classification using visual knowledge on the web

MULTIMEDIA '03 Proceedings of the eleventh ACM international conference on Multimedia
Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories

CVPRW '04 Proceedings of the 2004 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'04) Volume 12 - Volume 12
A Bayesian Hierarchical Model for Learning Natural Scene Categories

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 2 - Volume 02
Learning Object Categories from Google"s Image Search

ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision - Volume 2
Canonical Correlation Analysis: An Overview with Application to Learning Methods

Neural Computation
Image annotation by large-scale content-based image retrieval

MULTIMEDIA '06 Proceedings of the 14th annual ACM international conference on Multimedia
The Google Similarity Distance

IEEE Transactions on Knowledge and Data Engineering
Supervised Learning of Semantic Classes for Image Annotation and Retrieval

IEEE Transactions on Pattern Analysis and Machine Intelligence
Bipartite graph reinforcement model for web image annotation

Proceedings of the 15th international conference on Multimedia
Dual cross-media relevance model for image annotation

Proceedings of the 15th international conference on Multimedia
Modeling Semantic Aspects for Cross-Media Image Indexing

IEEE Transactions on Pattern Analysis and Machine Intelligence
80 Million Tiny Images: A Large Data Set for Nonparametric Object and Scene Recognition

IEEE Transactions on Pattern Analysis and Machine Intelligence
Flickr distance

MM '08 Proceedings of the 16th ACM international conference on Multimedia
Multi-progressive model for web image annotation

MM '08 Proceedings of the 16th ACM international conference on Multimedia
A New Baseline for Image Annotation

ECCV '08 Proceedings of the 10th European Conference on Computer Vision: Part III
Multiple Bernoulli relevance models for image and video annotation

CVPR'04 Proceedings of the 2004 IEEE computer society conference on Computer vision and pattern recognition
A correlation approach for automatic image annotation

ADMA'06 Proceedings of the Second international conference on Advanced Data Mining and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

To realize generic image recognition, the system needs to learn an enormous amount of targets in the world and their appearances. Therefore, visual knowledge acquisition using massive amounts of web images has been studied recently, and search-based methods are now flourishing in this research field. However, in general, search process of such methods are conducted using similarity measures based on simple image features and suffer from the semantic-gap. This is a big problem and can be a bottleneck of the entire systems. In this paper, we propose a method of image annotation and retrieval based on the new similarity measure, Canonical Contextual Distance. This method effectively uses contexts of images estimated from multiple labels and learns the essential and discriminative latent space. Using the probabilistic structure, our similarity measure can reflect both appearance and semantics of samples. Because our learning method is highly scalable, it is even effective in a large web-scale dataset. Therefore, our similarity measure will be helpful to many other search-based methods. In the experiment, we show that our method outperforms previous works using the standard Corel benchmark. Next, we verify our method by applying it to 3.5 million web images.