Spatial pyramids and two-layer stacking SVM classifiers for image categorization: a comparative study

Authors:
Azizi Abdullah;Remco C. Veltkamp;Marco A. Wiering
Affiliations:
Department of Information and Computer Sciences, Utrecht University, The Netherlands;Department of Information and Computer Sciences, Utrecht University, The Netherlands;Department of Artificial Intelligence, University of Groningen, The Netherlands
Venue:
IJCNN'09 Proceedings of the 2009 international joint conference on Neural Networks
Year:
2009

Citing 13
Cited 1

Original Contribution: Stacked generalization

Neural Networks
The nature of statistical learning theory

The nature of statistical learning theory
VisualSEEk: a fully automated content-based image query system

MULTIMEDIA '96 Proceedings of the fourth ACM international conference on Multimedia
Video Google: A Text Retrieval Approach to Object Matching in Videos

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Distinctive Image Features from Scale-Invariant Keypoints

International Journal of Computer Vision
Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories

CVPRW '04 Proceedings of the 2004 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'04) Volume 12 - Volume 12
Histograms of Oriented Gradients for Human Detection

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 1 - Volume 01
Efficient Image Matching with Distributions of Local Invariant Features

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 2 - Volume 02
The Pyramid Match Kernel: Discriminative Classification with Sets of Image Features

ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision - Volume 2
Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Representing shape with a spatial pyramid kernel

Proceedings of the 6th ACM international conference on Image and video retrieval
SURF: speeded up robust features

ECCV'06 Proceedings of the 9th European conference on Computer Vision - Volume Part I
Image classification for content-based indexing

IEEE Transactions on Image Processing

Multiple feature fusion based on co-training approach and time regularization for place classification in wearable video

Advances in Multimedia

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recent research in image recognition has shown that combining multiple descriptors is a very useful way to improve classification performance. Furthermore, the use of spatial pyramids that compute descriptors at multiple spatial resolution levels generally increases the discriminative power of the descriptors. In this paper we focus on combination methods that combine multiple descriptors at multiple spatial resolution levels. A possible problem of the naive solution to create one large input vector for a machine learning classifier such as a support vector machine, is that the input vector becomes of very large dimensionality, which can increase problems of overfitting and hinder generalization performance. Therefore we propose the use of stacking support vector machines where at the first layer each support vector machine receives the input constructed by each single descriptor and is trained to compute the right output class. A second layer support vector machine is then used to combine the class probabilities of all trained first layer support vector models to learn the right output class given these reduced input vectors. We have performed experiments on 20 classes from the Caltech object database with 10 different single descriptors at 3 different resolutions. The results show that our 2-layer stacking approach outperforms the naive approach that combines all descriptors directly in a very large single input vector.