Language modeling for bag-of-visual words image categorization

  • Authors:
  • Pierre Tirilly;Vincent Claveau;Patrick Gros

  • Affiliations:
  • CNRS, Rennes, France;CNRS, Rennes, France;INRIA, Rennes, France

  • Venue:
  • CIVR '08 Proceedings of the 2008 international conference on Content-based image and video retrieval
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we propose two ways of improving image classification based on bag-of-words representation [25]. Two shortcomings of this representation are the loss of the spatial information of visual words and the presence of noisy visual words due to the coarseness of the vocabulary building process. On the one hand, we propose a new representation of images that goes further in the analogy with textual data: visual sentences, that allows us to "read" visual words in a certain order, as in the case of text. We can therefore consider simple spatial relations between words. We also present a new image classification scheme that exploits these relations. It is based on the use of language models, a very popular tool from speech and text analysis communities. On the other hand, we propose new techniques to eliminate useless words, one based on geometric properties of the keypoints, the other on the use of probabilistic Latent Semantic Analysis (pLSA). Experiments show that our techniques can significantly improve image classification, compared to a classical Support Vector Machine-based classification.