On selecting additional predictive models in double bagging type ensemble method

  • Authors:
  • Zaman Faisal;Mohammad Mesbah Uddin;Hideo Hirose

  • Affiliations:
  • Kyushu Institute of Technology, Kawazu, Iizuka, Japan;Kyushu University, Fukuoka, Japan;Kyushu University, Fukuoka, Japan

  • Venue:
  • ICCSA'10 Proceedings of the 2010 international conference on Computational Science and Its Applications - Volume Part IV
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Double Bagging is a parallel ensemble method, where an additional classifier model is trained on the out-of-bag samples and then the posteriori class probabilities of this additional classifier are added with the inbag samples to train a decision tree classifier. The subsampled version of double bagging depend on two hyper parameters, subsample ratio (SSR) and an additional classifier. In this paper we have proposed an embedded cross-validation based selection technique to select one of these parameters automatically. This selection technique builds different ensemble classifier models with each of these parameter values (keeping another fixed) during the training phase of the ensemble method and finally select the one with the highest accuracy. We have used four additional classifier models, Radial Basis Support Vector Machine (RSVM), Linear Support Vector Machine (LSVM), Nearest Neighbor Classifier (5-NN and 10-NN) with five subsample ratios (SSR), 0.1, 0.2, 0.3, 0.4 and 0.5. We have reported the performance of the subsampled double bagging ensemble with these SSRs with each of these additional classifiers. In our experiments we have used UCI benchmark datasets. The results indicate that LSVM has superior performance as an additional classifiers in enhancing the predictive power of double bagging, where as with SSR 0.4 and 0.5 double bagging has better performance, than with other SSRs. We have also compared the performance of these resulting ensemble methods with Bagging, Adaboost, Double Bagging (original) and Rotation Forest. Experimental results show that the performance of the resulting subsampled double bagging ensemble is better than these ensemble methods.