Spatial extensions to bag of visual words

  • Authors:
  • Ville Viitaniemi;Jorma Laaksonen

  • Affiliations:
  • Helsinki University of Technology (TKK), TKK, Finland;Helsinki University of Technology (TKK), TKK, Finland

  • Venue:
  • Proceedings of the ACM International Conference on Image and Video Retrieval
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

The Bag of Visual Words (BoV) paradigm has successfully been applied to image content analysis tasks such as image classification and object detection. The basic BoV approach overlooks spatial descriptor distribution within images. Here we describe spatial extensions to BoV and experimentally compare them in the VOC2007 benchmark image category detection task. In particular, we compare two ways for tiling images geometrically: soft tiling approach---proposed here---and the traditional hard tiling technique. The experiments also address two methods of fusing information from several tilings of the images: post-classifier fusion and fusion on the level of a SVM kernel. The experiments confirm that the performance of a BoV system can be greatly enhanced by taking the descriptors' spatial distribution into account. The soft tiling technique performs well even with a single tiling mask, whereas multi-mask fusion is necessary for good category detection performance in case of hard tiling. The evaluated fusion mechanisms performed approximately equally well.