Semantics-preserving bag-of-words models for efficient image annotation

Authors:
Lei Wu;Steven C.H. Hoi;Nenghai Yu
Affiliations:
University of Science and Technology of China, Hefei, China;Nanyang Technological University, Singapore, Singapore;University of Science and Technology of China, Hefei, China
Venue:
LS-MMRM '09 Proceedings of the First ACM workshop on Large-scale multimedia retrieval and mining
Year:
2009

Citing 26
Cited 5

Probabilistic latent semantic indexing

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary

ECCV '02 Proceedings of the 7th European Conference on Computer Vision-Part IV
Object Recognition from Local Scale-Invariant Features

ICCV '99 Proceedings of the International Conference on Computer Vision-Volume 2 - Volume 2
Latent dirichlet allocation

The Journal of Machine Learning Research
Matching words and pictures

The Journal of Machine Learning Research
Distinctive Image Features from Scale-Invariant Keypoints

International Journal of Computer Vision
A novel log-based relevance feedback technique in content-based image retrieval

Proceedings of the 12th annual ACM international conference on Multimedia
Multi-level annotation of natural scenes using dominant image components and semantic concepts

Proceedings of the 12th annual ACM international conference on Multimedia
Effective automatic image annotation via a coherent language model and active learning

Proceedings of the 12th annual ACM international conference on Multimedia
Random Subwindows for Robust Image Classification

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 1 - Volume 01
Formulating Semantic Image Annotation as a Supervised Learning Problem

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 2 - Volume 02
A Bayesian Hierarchical Model for Learning Natural Scene Categories

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 2 - Volume 02
Using One-Class and Two-Class SVMs for Multiclass Image Annotation

IEEE Transactions on Knowledge and Data Engineering
Discovering Objects and their Localization in Images

ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1 - Volume 01
Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Learning Distance Metrics with Contextual Constraints for Image Retrieval

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
AnnoSearch: Image Auto-Annotation by Search

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Real-time computerized annotation of pictures

MULTIMEDIA '06 Proceedings of the 14th annual ACM international conference on Multimedia
Information-theoretic metric learning

Proceedings of the 24th international conference on Machine learning
Towards optimal bag-of-features for object categorization and semantic video retrieval

Proceedings of the 6th ACM international conference on Image and video retrieval
Visual language modeling for image classification

Proceedings of the international workshop on Workshop on multimedia information retrieval
LabelMe: A Database and Web-Based Tool for Image Annotation

International Journal of Computer Vision
Language modeling for bag-of-visual words image categorization

CIVR '08 Proceedings of the 2008 international conference on Content-based image and video retrieval
Flickr distance

MM '08 Proceedings of the 16th ACM international conference on Multimedia
An efficient algorithm for local distance metric learning

AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
Scale-invariant visual language modeling for object categorization

IEEE Transactions on Multimedia - Special issue on integration of context and content

Building contextual visual vocabulary for large-scale image applications

Proceedings of the international conference on Multimedia
Building descriptive and discriminative visual codebook for large-scale image applications

Multimedia Tools and Applications
Modeling spatial and semantic cues for large-scale near-duplicated image retrieval

Computer Vision and Image Understanding
Visual content representation using semantically similar visual words

Expert Systems with Applications: An International Journal
Discriminative codebook learning for Web image search

Signal Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

The Bag-of-Words (BoW) model is a promising image representation for annotation. One critical limitation of existing BoW models is the semantic loss during the codebook generation process, in which BoW simply clusters visual words in Euclidian space. However, distance between two visual words in Euclidean space does not necessarily reflect the semantic distance between the two concepts, due to the semantic gap between low-level features and high-level semantics. In this paper, we propose a novel scheme for learning a codebook such that semantically related features will be mapped to the same visual word. In particular, we consider the distance between semantically identical features as a measurement of the semantic gap, and attempt to learn an optimized codebook by minimizing this gap. We refer to such a new codebook method as Semantics-Preserving Codebook (SPC) and the corresponding model as Semantics-Preserving Bag-of-Words model (SPBoW). This novel model generates codebook for each object category and only needs to update the codebook for a specific category when incomes an object, which makes it convenient to scale up with the increasing number of objects. Experiments on image annotation tasks with a public testbed from MIT's Labelme project, which contains 11,281 objects of 495 categories, show that the SPC learning scheme is efficient in handling large number of objects and is able to greatly improve the performance of the existing BoW model.