Scale-invariant visual language modeling for object categorization

Authors:
Lei Wu;Yang Hu;Mingjing Li;Nenghai Yu;Xian-Sheng Hua
Affiliations:
MOE, Microsoft Key Laboratory of Multimedia Computing and Communication, Department of Electrical Engineering and Information Science, University of Science of Technology of China, Hefei, China;MOE, Microsoft Key Laboratory of Multimedia Computing and Communication, Department of Electrical Engineering and Information Science, University of Science of Technology of China, Hefei, China;MOE, Microsoft Key Laboratory of Multimedia Computing and Communication, Department of Electrical Engineering and Information Science, University of Science of Technology of China, Hefei, China;MOE, Microsoft Key Laboratory of Multimedia Computing and Communication, Department of Electrical Engineering and Information Science, University of Science of Technology of China, Hefei, China;Microsoft Research Asia, Beijing, China
Venue:
IEEE Transactions on Multimedia - Special issue on integration of context and content
Year:
2009

Citing 22
Cited 12

Classification of Rotated and Scaled Textured Images Using Gaussian Markov Random Field Models

IEEE Transactions on Pattern Analysis and Machine Intelligence
Probabilistic latent semantic indexing

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Markov random field modeling in image analysis

Markov random field modeling in image analysis
SIMPLIcity: Semantics-Sensitive Integrated Matching for Picture LIbraries

IEEE Transactions on Pattern Analysis and Machine Intelligence
An Affine Invariant Interest Point Detector

ECCV '02 Proceedings of the 7th European Conference on Computer Vision-Part I
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Hybrid Hidden Markov Model for Face Recognition

SSIAI '00 Proceedings of the 4th IEEE Southwest Symposium on Image Analysis and Interpretation
Object Recognition from Local Scale-Invariant Features

ICCV '99 Proceedings of the International Conference on Computer Vision-Volume 2 - Volume 2
Automatic Linguistic Indexing of Pictures by a Statistical Modeling Approach

IEEE Transactions on Pattern Analysis and Machine Intelligence
Latent dirichlet allocation

The Journal of Machine Learning Research
Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories

CVPRW '04 Proceedings of the 2004 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'04) Volume 12 - Volume 12
Random Subwindows for Robust Image Classification

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 1 - Volume 01
A Sparse Support Vector Machine Approach to Region-Based Image Categorization

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 1 - Volume 01
A Bayesian Hierarchical Model for Learning Natural Scene Categories

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 2 - Volume 02
Discovering Objects and their Localization in Images

ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1 - Volume 01
A Maximum Entropy Framework for Part-Based Texture and Object Recognition

ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1 - Volume 01
Modeling Scenes with Local Descriptors and Latent Aspects

ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1 - Volume 01
Learning Object Categories from Google"s Image Search

ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision - Volume 2
Using Multiple Segmentations to Discover Objects and their Extent in Image Collections

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Discriminative Object Class Models of Appearance and Shape by Correlatons

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Hyperfeatures – multilevel local coding for visual recognition

ECCV'06 Proceedings of the 9th European conference on Computer Vision - Volume Part I
A boundary-fragment-model for object detection

ECCV'06 Proceedings of the 9th European conference on Computer Vision - Volume Part II

Semantics-preserving bag-of-words models for efficient image annotation

LS-MMRM '09 Proceedings of the First ACM workshop on Large-scale multimedia retrieval and mining
Image categorization via robust pLSA

Pattern Recognition Letters
Eye movement as an interaction mechanism for relevance feedback in a content-based image retrieval system

Proceedings of the 2010 Symposium on Eye-Tracking Research & Applications
Image retrieval based on multi-texton histogram

Pattern Recognition
Semantics-preserving bag-of-words models and applications

IEEE Transactions on Image Processing
Scene categorization using boosted back-propagation neural networks

PCM'10 Proceedings of the 11th Pacific Rim conference on Advances in multimedia information processing: Part I
Boosted scene categorization approach by adjusting inner structures and outer weights of weak classifiers

MMM'11 Proceedings of the 17th international conference on Advances in multimedia modeling - Volume Part I
Improving bag-of-visual-words model with spatial-temporal correlation for video retrieval

Proceedings of the 21st ACM international conference on Information and knowledge management
ISABoost: A weak classifier inner structure adjusting based AdaBoost algorithm-ISABoost based application in scene categorization

Neurocomputing
Tag-Saliency: Combining bottom-up and top-down information for saliency detection

Computer Vision and Image Understanding
Image categorization using a semantic hierarchy model with sparse set of salient regions

Frontiers of Computer Science: Selected Publications from Chinese Universities
HWVP: hierarchical wavelet packet descriptors and their applications in scene categorization and semantic concept retrieval

Multimedia Tools and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

In recent years, "bag-of-words" models, which treat an image as a collection of unordered visual words, have been widely applied in the multimedia and computer vision fields. However, their ignorance of the spatial structure among visual words makes them indiscriminative for objects with similar word frequencies but different word spatial distributions. In this paper, we propose a visual language modeling method (VLM), which incorporates the spatial context of the local appearance features into the statistical language model. To represent the object categories, models with different orders of statistical dependencies have been exploited. In addition, the multilayer extension to the VLM makes it more resistant to scale variations of objects. The model is effective and applicable to large scale image categorization. We train scale invariant visual language models based on the images which are grouped by Flickr tags, and use these models for object categorization. Experimental results show they achieve better performance than single layer visual language models and "bag-of-words" models. They also achieve comparable performance with 2-D MHMM and SVM-based methods, while costing much less computational time.