Distances and weighting schemes for bag of visual words image retrieval

Authors:
Pierre Tirilly;Vincent Claveau;Patrick Gros
Affiliations:
CNRS/IRISA, Rennes, France;CNRS/IRISA, Rennes, France;INRIA Centre Rennes - Bretagne Atlantique, Rennes, France
Venue:
Proceedings of the international conference on Multimedia information retrieval
Year:
2010

Citing 13
Cited 1

A probabilistic model of information retrieval: development and comparative experiments

Information Processing and Management: an International Journal
A probabilistic model of information retrieval: development and comparative experiments Part 2

Information Processing and Management: an International Journal
Probabilistic models of information retrieval based on measuring the divergence from randomness

ACM Transactions on Information Systems (TOIS)
Video Google: A Text Retrieval Approach to Object Matching in Videos

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories

CVPRW '04 Proceedings of the 2004 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'04) Volume 12 - Volume 12
A Performance Evaluation of Local Descriptors

IEEE Transactions on Pattern Analysis and Machine Intelligence
A Comparison of Affine Region Detectors

International Journal of Computer Vision
Scalable Recognition with a Vocabulary Tree

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Effective and efficient object-based image retrieval using visual phrases

MULTIMEDIA '06 Proceedings of the 14th annual ACM international conference on Multimedia
Towards optimal bag-of-features for object categorization and semantic video retrieval

Proceedings of the 6th ACM international conference on Image and video retrieval
Evaluating bag-of-visual-words representations in scene classification

Proceedings of the international workshop on Workshop on multimedia information retrieval
Visual word proximity and linguistics for semantic video indexing and near-duplicate retrieval

Computer Vision and Image Understanding
Scene classification via pLSA

ECCV'06 Proceedings of the 9th European conference on Computer Vision - Volume Part IV

Geometric consistency checks for kNN based image classification relying on local features

Proceedings of the Fourth International Conference on SImilarity Search and APplications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Current text retrieval techniques allow to index and retrieve text documents very efficiently and with a good accuracy. Image retrieval, on the contrary, is still very coarse and does not yield satisfying results. Therefore, computer vision researchers try to benefit from text retrieval contributions to enhance their retrieval systems. In particular, Sivic and Zisserman, with their video-google framework [1], propose a description of images similar to standard text descriptors: images are described by elementary image parts, called visual words. Thus, they perform image retrieval using the standard Vector Space Model (VSM) of Information Retrieval (IR) and benefit from some classical IR techniques such as inverted files. Among available text retrieval techniques, automatically finding the most relevant words to describe a document has been intensively studied for texts, but not for images. In this paper, we propose to explore the use of term weighting techniques and classical distances from text retrieval in the case of images. These weights are standard VSM weights, weights derived from probabilistic models of IR or new weighting schemes that we propose. Our experiments, performed on several datasets, show that no weighting scheme can improve retrieval on every dataset, but also that choosing weights fitting the properties of the dataset can improve precision and MAP up to 10%. This study provides some interesting insights about the semantic and statistical differences between textual and visual words, and about the way visual word-based image retrieval systems can be optimized. It also shows some limits of the bag of visual words model, and the relation existing between Minkowski distances and local weighting. At last, this study questions some experimental habits commonly found in the literature (choice of L1 distance, TF*IDF weights and evaluation using one dataset only).