Spatial Weighting for Bag-of-Visual-Words and Its Application in Content-Based Image Retrieval

Authors:
Xin Chen;Xiaohua Hu;Xiajiong Shen
Affiliations:
College of Information Science and Technology, Drexel University, Philadelphia, USA;College of Information Science and Technology, Drexel University, Philadelphia, USA and College of Computer and Information Engineering, Henan University, Henan, China;College of Computer and Information Engineering, Henan University, Henan, China
Venue:
PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Year:
2009

Citing 12
Cited 2

Content-Based Image Retrieval at the End of the Early Years

IEEE Transactions on Pattern Analysis and Machine Intelligence
Saliency, Scale and Image Description

International Journal of Computer Vision
Color- and Texture-Based Image Segmentation Using EM and Its Application to Content-Based Image Retrieval

ICCV '98 Proceedings of the Sixth International Conference on Computer Vision
Video Google: A Text Retrieval Approach to Object Matching in Videos

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Distinctive Image Features from Scale-Invariant Keypoints

International Journal of Computer Vision
Content-based multimedia information retrieval: State of the art and challenges

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study

International Journal of Computer Vision
Towards optimal bag-of-features for object categorization and semantic video retrieval

Proceedings of the 6th ACM international conference on Image and video retrieval
Evaluating bag-of-visual-words representations in scene classification

Proceedings of the international workshop on Workshop on multimedia information retrieval
Scene Classification Using a Hybrid Generative/Discriminative Approach

IEEE Transactions on Pattern Analysis and Machine Intelligence
Coloring local feature extraction

ECCV'06 Proceedings of the 9th European conference on Computer Vision - Volume Part II

Classification of high-dimension PDFs using the hungarian algorithm

SSPR'12/SPR'12 Proceedings of the 2012 Joint IAPR international conference on Structural, Syntactic, and Statistical Pattern Recognition
Spatial weighting for bag-of-features based image retrieval

IUKM'13 Proceedings of the 2013 international conference on Integrated Uncertainty in Knowledge Modelling and Decision Making

Quantified Score

Hi-index	0.00

Visualization

Abstract

It is a challenging and important task to retrieve images from a large and highly varied image data set based on their visual contents. Problems like how to fill the semantic gap between image features and the user have attracted a lot of attention from the research community. Recently, the 'bag of visual words' approach exhibits very good performance in content-based image retrieval (CBIR). However, since the 'bag of visual words' approach represents an image as an unordered collection of local descriptors which only use the intensity information, the resulting model provides little insight about the spatial constitution and color information of the image. In this paper, we develop a novel image representation method which uses Gaussian mixture model (GMM) to provide spatial weighting for visual words and apply this method to facilitate content based image retrieval. Our approach is a simple and more efficient compared with the order-less 'bag of visual words' approach. In our method, firstly, we extract visual tokens from the image data set and cluster them into a lexicon of visual words. Then, we represent the spatial constitution of an image as a mixture of n Gaussians in the feature space and decompose the image into n regions. The spatial weighting scheme is achieved by weighting visual words according to the probability of each visual word belonging to each of the n regions in the image. The cosine similarity between spatial weighted visual word vectors is used as distance measurement between regions, while the image-level distance is obtained by averaging the pair-wise distances between regions. We compare the performance of our method with the traditional 'bag of visual words' and 'blobworld' approaches under the same image retrieval scenario. Experimental results demonstrate that the our method is able to tell images apart in the semantic level and improve the performance of CBIR.