Spatial extensions to bag of visual words

Authors:
Ville Viitaniemi;Jorma Laaksonen
Affiliations:
Helsinki University of Technology (TKK), TKK, Finland;Helsinki University of Technology (TKK), TKK, Finland
Venue:
Proceedings of the ACM International Conference on Image and Video Retrieval
Year:
2009

Citing 11
Cited 9

Video Google: A Text Retrieval Approach to Object Matching in Videos

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Scale & Affine Invariant Interest Point Detectors

International Journal of Computer Vision
Distinctive Image Features from Scale-Invariant Keypoints

International Journal of Computer Vision
Scalable Recognition with a Vocabulary Tree

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study

International Journal of Computer Vision
The Pyramid Match Kernel: Efficient Learning with Sets of Features

The Journal of Machine Learning Research
Representing shape with a spatial pyramid kernel

Proceedings of the 6th ACM international conference on Image and video retrieval
Evaluating bag-of-visual-words representations in scene classification

Proceedings of the international workshop on Workshop on multimedia information retrieval
Kernel Codebooks for Scene Categorization

ECCV '08 Proceedings of the 10th European Conference on Computer Vision: Part III
Improving the accuracy of global feature fusion based image categorisation

SAMT'07 Proceedings of the semantic and digital media technologies 2nd international conference on Semantic Multimedia

Embedding spatial information into image content description for scene retrieval

Pattern Recognition
Region matching techniques for spatial bag of visual words based image category recognition

ICANN'10 Proceedings of the 20th international conference on Artificial neural networks: Part I
Towards a universal and limited visual vocabulary

ISVC'11 Proceedings of the 7th international conference on Advances in visual computing - Volume Part II
Images as sets of locally weighted features

Computer Vision and Image Understanding
Nearest-Neighbor based Metric Functions for indoor scene recognition

Computer Vision and Image Understanding
Modeling the spatial layout of images beyond spatial pyramids

Pattern Recognition Letters
A novel unsupervised approach for multilevel image clustering from unordered image collection

Frontiers of Computer Science: Selected Publications from Chinese Universities
An experimental study on the universality of visual vocabularies

Journal of Visual Communication and Image Representation
E-LAMP: integration of innovative ideas for multimedia event detection

Machine Vision and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

The Bag of Visual Words (BoV) paradigm has successfully been applied to image content analysis tasks such as image classification and object detection. The basic BoV approach overlooks spatial descriptor distribution within images. Here we describe spatial extensions to BoV and experimentally compare them in the VOC2007 benchmark image category detection task. In particular, we compare two ways for tiling images geometrically: soft tiling approach---proposed here---and the traditional hard tiling technique. The experiments also address two methods of fusing information from several tilings of the images: post-classifier fusion and fusion on the level of a SVM kernel. The experiments confirm that the performance of a BoV system can be greatly enhanced by taking the descriptors' spatial distribution into account. The soft tiling technique performs well even with a single tiling mask, whereas multi-mask fusion is necessary for good category detection performance in case of hard tiling. The evaluated fusion mechanisms performed approximately equally well.