Spatial pyramids and two-layer stacking SVM classifiers for image categorization: a comparative study

  • Authors:
  • Azizi Abdullah;Remco C. Veltkamp;Marco A. Wiering

  • Affiliations:
  • Department of Information and Computer Sciences, Utrecht University, The Netherlands;Department of Information and Computer Sciences, Utrecht University, The Netherlands;Department of Artificial Intelligence, University of Groningen, The Netherlands

  • Venue:
  • IJCNN'09 Proceedings of the 2009 international joint conference on Neural Networks
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Recent research in image recognition has shown that combining multiple descriptors is a very useful way to improve classification performance. Furthermore, the use of spatial pyramids that compute descriptors at multiple spatial resolution levels generally increases the discriminative power of the descriptors. In this paper we focus on combination methods that combine multiple descriptors at multiple spatial resolution levels. A possible problem of the naive solution to create one large input vector for a machine learning classifier such as a support vector machine, is that the input vector becomes of very large dimensionality, which can increase problems of overfitting and hinder generalization performance. Therefore we propose the use of stacking support vector machines where at the first layer each support vector machine receives the input constructed by each single descriptor and is trained to compute the right output class. A second layer support vector machine is then used to combine the class probabilities of all trained first layer support vector models to learn the right output class given these reduced input vectors. We have performed experiments on 20 classes from the Caltech object database with 10 different single descriptors at 3 different resolutions. The results show that our 2-layer stacking approach outperforms the naive approach that combines all descriptors directly in a very large single input vector.