Estimation of optimal sample size of decision forest with SVM using embedded cross-validation method

  • Authors:
  • Md. Faisal Zaman;Hideo Hirose

  • Affiliations:
  • Kyushu Institute of Technlogy, Fukuoka, Japan;Kyushu Institute of Technlogy, Fukuoka, Japan

  • Venue:
  • ACIIDS'11 Proceedings of the Third international conference on Intelligent information and database systems - Volume Part II
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper the performance of the m-out-of-n decision forest of SVM without replacement with different subsampling ratio (m/n) is analyzed in terms of an embedded cross-validation technique. The subsampling ratio plays a pivotal role in improving the performance of the decision forest of SVM. Because the SVM in this ensemble enlarge the feature space of the underlying base decision tree classifiers and guarantees a improved performance of the ensemble overall. To ensure the better training of the SVM generally the out-of-bag sample is kept larger but there is no general rule to estimate the optimal sample size for the decision forest. In this paper we propose to use the embedded crossvalidation method to select the a near optimum value of the sampling ratio. In our criterion the decision forest of SVM trained on independent samples whose size is such that the cross-validation error of that ensemble is as low as possible, will produce an improved generalization performance for the ensemble.