Boosting and measuring the performance of ensembles for a successful database marketing

  • Authors:
  • YongSeog Kim

  • Affiliations:
  • MIS Department, Jon M. Huntsman College of Business, Utah State University, Logan, UT 84322, USA

  • Venue:
  • Expert Systems with Applications: An International Journal
  • Year:
  • 2009

Quantified Score

Hi-index 12.05

Visualization

Abstract

This paper provides insights on advantages and disadvantages of two ensemble models: ensembles based on sampling and feature selection. Experimental results confirm that both ensemble methods make robust ensembles and significantly improve the prediction performance of single classifiers at the cost of interpretability and additional computing resources. In particular, classifiers utilizing prior class distributions like support vector machine and naive Bayesian classifier only marginally benefit from ensembles, while classifiers with higher variance like neural networks and tree learners make a strong ensemble. Further, there seems to be an optimal ratio of selecting input variables that maximizes the performance of ensembles while minimizing computational costs when feature selection is used to create ensembles. Finally, we show that most evaluation methods become useless when we compare models on data sets with very skewed class distributions.