A heuristically perturbation of dataset to achieve a diverse ensemble of classifiers

  • Authors:
  • Hamid Parvin;Sajad Parvin;Zahra Rezaei;Moslem Mohamadi

  • Affiliations:
  • Nourabad Mamasani Branch, Islamic Azad University, Nourabad Mamasani, Iran;Nourabad Mamasani Branch, Islamic Azad University, Nourabad Mamasani, Iran;Nourabad Mamasani Branch, Islamic Azad University, Nourabad Mamasani, Iran;Nourabad Mamasani Branch, Islamic Azad University, Nourabad Mamasani, Iran

  • Venue:
  • MCPR'12 Proceedings of the 4th Mexican conference on Pattern Recognition
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Ensemble methods like Bagging and Boosting which combine the decisions of multiple hypotheses are among the strongest existing machine learning methods. The diversity of the members of an ensemble is known to be an important factor in determining its generalization error. We present a new method for generating ensembles, named CDEBMTE (Creation of Diverse Ensemble Based on Manipulation of Training Examples), that directly constructs diverse hypotheses using manipulation of training examples in three ways: (1) sub-sampling training examples, (2) decreasing/increasing errorprone training examples and (3) decreasing/increasing neighbor samples of error-prone training examples. The technique is a simple, general meta-learner that can use any strong learner as a base classifier to build diverse committees. Experimental results using two well-known classifiers (1) decision-tree induction and (2) multilayer perceptron as two base learners demonstrate that this approach consistently achieves higher predictive accuracy than both the base classifier, Adaboost and Bagging. CDEBMTE also outperforms Adaboost more prominent when training data size is becomes larger. We propose to show that CDEBMTE can be effectively used to achieve higher accuracy and to obtain better class membership probability estimates. Experimental results using two well-known classifiers as two base learners demonstrate that this approach consistently achieves higher predictive accuracy than both the base classifier, Adaboost and Bagging. CDEBMTE also outperforms Adaboost more prominent when training data size is becomes larger.