Semantics-preserving bag-of-words models and applications

Authors:
Lei Wu;Steven C. H. Hoi;Nenghai Yu
Affiliations:
MOE-MS Key Lab of MCC, Department of EEIS, University of Science and Technology of China, Hefei, China;School of Computer Engineering, Nanyang Technological University, Singapore;MOE-MS Key Lab of MCC, Department of EEIS, University of Science and Technology of China, Hefei, China
Venue:
IEEE Transactions on Image Processing
Year:
2010

Citing 27
Cited 8

Probabilistic latent semantic indexing

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
SIMPLIcity: Semantics-Sensitive Integrated Matching for Picture LIbraries

IEEE Transactions on Pattern Analysis and Machine Intelligence
Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary

ECCV '02 Proceedings of the 7th European Conference on Computer Vision-Part IV
Automatic image annotation and retrieval using cross-media relevance models

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Latent dirichlet allocation

The Journal of Machine Learning Research
Matching words and pictures

The Journal of Machine Learning Research
Distinctive Image Features from Scale-Invariant Keypoints

International Journal of Computer Vision
Multi-level annotation of natural scenes using dominant image components and semantic concepts

Proceedings of the 12th annual ACM international conference on Multimedia
Effective automatic image annotation via a coherent language model and active learning

Proceedings of the 12th annual ACM international conference on Multimedia
Random Subwindows for Robust Image Classification

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 1 - Volume 01
Formulating Semantic Image Annotation as a Supervised Learning Problem

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 2 - Volume 02
Using One-Class and Two-Class SVMs for Multiclass Image Annotation

IEEE Transactions on Knowledge and Data Engineering
Discovering Objects and their Localization in Images

ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1 - Volume 01
Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Learning Distance Metrics with Contextual Constraints for Image Retrieval

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
AnnoSearch: Image Auto-Annotation by Search

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Real-time computerized annotation of pictures

MULTIMEDIA '06 Proceedings of the 14th annual ACM international conference on Multimedia
Information-theoretic metric learning

Proceedings of the 24th international conference on Machine learning
Towards optimal bag-of-features for object categorization and semantic video retrieval

Proceedings of the 6th ACM international conference on Image and video retrieval
Visual language modeling for image classification

Proceedings of the international workshop on Workshop on multimedia information retrieval
Logarithmic regret algorithms for online convex optimization

Machine Learning
LabelMe: A Database and Web-Based Tool for Image Annotation

International Journal of Computer Vision
Language modeling for bag-of-visual words image categorization

CIVR '08 Proceedings of the 2008 international conference on Content-based image and video retrieval
An efficient algorithm for local distance metric learning

AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
Distance metric learning from uncertain side information with application to automated photo tagging

MM '09 Proceedings of the 17th ACM international conference on Multimedia
Scale-invariant visual language modeling for object categorization

IEEE Transactions on Multimedia - Special issue on integration of context and content
Adapted vocabularies for generic visual categorization

ECCV'06 Proceedings of the 9th European conference on Computer Vision - Volume Part IV

Joint segmentation of collectively moving objects using a bag-of-words model and level set evolution

Pattern Recognition
Image annotation by semantic sparse recoding of visual content

Proceedings of the 20th ACM international conference on Multimedia
Online multi-modal distance learning for scalable multimedia retrieval

Proceedings of the sixth ACM international conference on Web search and data mining
Online multimodal deep similarity learning with application to image retrieval

Proceedings of the 21st ACM international conference on Multimedia
Unsupervised approximate-semantic vocabulary learning for human action and video classification

Pattern Recognition Letters
Tag-Saliency: Combining bottom-up and top-down information for saliency detection

Computer Vision and Image Understanding
Continuous human action recognition in real time

Multimedia Tools and Applications
Image categorization using a semantic hierarchy model with sparse set of salient regions

Frontiers of Computer Science: Selected Publications from Chinese Universities

Quantified Score

Hi-index	0.01

Visualization

Abstract

The Bag-of-Words (BoW) model is a promising image representation technique for image categorization and annotation tasks. One critical limitation of existing BoW models is that much semantic information is lost during the codebook generation process, an important step of BoW. This is because the codebook generated by BoW is often obtained via building the codebook simply by clustering visual features in Euclidian space. However, visual features related to the same semantics may not distribute in clusters in the Euclidian space, which is primarily due to the semantic gap between low-level features and high-level semantics. In this paper, we propose a novel scheme to learn optimized BoW models, which aims to map semantically related features to the same visual words. In particular, we consider the distance between semantically identical features as a measurement of the semantic gap, and attempt to learn an optimized codebook by minimizing this gap, aiming to achieve the minimal loss of the semantics. We refer to such kind of novel codebook as semantics-preserving codebook (SPC) and the corresponding model as the Semantics-Preserving Bag-of-Words (SPBoW) model. Extensive experiments on image annotation and object detection tasks with public testbeds from MIT's Labelme and PASCAL VOC challenge databases show that the proposed SPC learning scheme is effective for optimizing the codebook generation process, and the SPBoW model is able to greatly enhance the performance of the existing BoW model.