Building descriptive and discriminative visual codebook for large-scale image applications

Authors:
Qi Tian;Shiliang Zhang;Wengang Zhou;Rongrong Ji;Bingbing Ni;Nicu Sebe
Affiliations:
Computer Science Department, University of Texas at San Antonio, San Antonio, USA 78249;Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China 100190;EEIS Department, University of Science and Technology of China, Heifei, China 230027;Harbin Institute of Technology, Harbin, China 150001;National University of Singapore, Singapore, Singapore 117576;Department of Information Engineering and Computer Science, University of Trento, Trento, Italy
Venue:
Multimedia Tools and Applications
Year:
2011

Citing 40
Cited 4

Term-weighting approaches in automatic text retrieval

Information Processing and Management: an International Journal
The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Probabilistic latent semantic indexing

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography

Communications of the ACM
Unsupervised learning by probabilistic latent semantic analysis

Machine Learning
Representing and Recognizing the Visual Appearance of Materials using Three-dimensional Textons

International Journal of Computer Vision
Modern Information Retrieval

Modern Information Retrieval
Self-Organizing Maps

Self-Organizing Maps
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
Learning a Sparse Representation for Object Detection

ECCV '02 Proceedings of the 7th European Conference on Computer Vision-Part IV
Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary

ECCV '02 Proceedings of the 7th European Conference on Computer Vision-Part IV
Latent dirichlet allocation

The Journal of Machine Learning Research
Video Google: A Text Retrieval Approach to Object Matching in Videos

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Distinctive Image Features from Scale-Invariant Keypoints

International Journal of Computer Vision
Creating Efficient Codebooks for Visual Recognition

ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1 - Volume 01
A Spectral Technique for Correspondence Problems Using Pairwise Constraints

ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision - Volume 2
Scalable Recognition with a Vocabulary Tree

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Spatial Weighting for Bag-of-Features

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Discriminative Object Class Models of Appearance and Shape by Correlatons

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study

International Journal of Computer Vision
A generalized VQ method for combined compression and estimation

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 04
Evaluating bag-of-visual-words representations in scene classification

Proceedings of the international workshop on Workshop on multimedia information retrieval
Universal and Adapted Vocabularies for Generic Visual Categorization

IEEE Transactions on Pattern Analysis and Machine Intelligence
Randomized Clustering Forests for Image Classification

IEEE Transactions on Pattern Analysis and Machine Intelligence
VisualRank: Applying PageRank to Large-Scale Image Search

IEEE Transactions on Pattern Analysis and Machine Intelligence
Video Event Recognition Using Kernel Methods with Multilevel Temporal Alignment

IEEE Transactions on Pattern Analysis and Machine Intelligence
Video event detection using motion relativity and visual relatedness

MM '08 Proceedings of the 16th ACM international conference on Multimedia
Unsupervised modeling and recognition of object categories with combination of visual contents and geometric similarity links

MIR '08 Proceedings of the 1st ACM international conference on Multimedia information retrieval
Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search

ECCV '08 Proceedings of the 10th European Conference on Computer Vision: Part I
Spatial Hierarchy of Textons Distributions for Scene Classification

MMM '09 Proceedings of the 15th International Multimedia Modeling Conference on Advances in Multimedia Modeling
Tag ranking

Proceedings of the 18th international conference on World wide web
Supervised Learning of Quantizer Codebooks by Information Loss Minimization

IEEE Transactions on Pattern Analysis and Machine Intelligence
Semantics-preserving bag-of-words models for efficient image annotation

LS-MMRM '09 Proceedings of the First ACM workshop on Large-scale multimedia retrieval and mining
Visual ContextRank for web image re-ranking

LS-MMRM '09 Proceedings of the First ACM workshop on Large-scale multimedia retrieval and mining
Descriptive visual words and visual phrases for image applications

MM '09 Proceedings of the 17th ACM international conference on Multimedia
Latent visual context analysis for image re-ranking

Proceedings of the ACM International Conference on Image and Video Retrieval
Visual Word Ambiguity

IEEE Transactions on Pattern Analysis and Machine Intelligence
Building contextual visual vocabulary for large-scale image applications

Proceedings of the international conference on Multimedia
Spatial coding for large scale partial-duplicate web image search

Proceedings of the international conference on Multimedia
Adapted vocabularies for generic visual categorization

ECCV'06 Proceedings of the 9th European conference on Computer Vision - Volume Part IV

Exploiting visual word co-occurrence for image retrieval

Proceedings of the 20th ACM international conference on Multimedia
Spatial pooling of heterogeneous features for image applications

Proceedings of the 20th ACM international conference on Multimedia
Scale based region growing for scene text detection

Proceedings of the 21st ACM international conference on Multimedia
Medical image retrieval using bag of meaningful visual words: unsupervised visual vocabulary pruning with PLSA

Proceedings of the 1st ACM international workshop on Multimedia indexing and information retrieval for healthcare

Quantified Score

Hi-index	0.00

Visualization

Abstract

Inspired by the success of textual words in large-scale textual information processing, researchers are trying to extract visual words from images which function similar as textual words. Visual words are commonly generated by clustering a large amount of image local features and the cluster centers are taken as visual words. This approach is simple and scalable, but results in noisy visual words. Lots of works are reported trying to improve the descriptive and discriminative ability of visual words. This paper gives a comprehensive survey on visual vocabulary and details several state-of-the-art algorithms. A comprehensive review and summarization of the related works on visual vocabulary is first presented. Then, we introduce our recent algorithms on descriptive and discriminative visual word generation, i.e., latent visual context analysis for descriptive visual word identification [74], descriptive visual words and visual phrases generation [68], contextual visual vocabulary which combines both semantic contexts and spatial contexts [69], and visual vocabulary hierarchy optimization [18]. Additionally, we introduce two interesting post processing strategies to further improve the performance of visual vocabulary, i.e., spatial coding [73] is proposed to efficiently remove the mismatched visual words between images for more reasonable image similarity computation; user preference based visual word weighting [44] is developed to make the image similarity computed based on visual words more consistent with users' preferences or habits.