Genetic programming in classifying large-scale data: an ensemble method

  • Authors:
  • Yifeng Zhang;Siddhartha Bhattacharyya

  • Affiliations:
  • Information and Decision Sciences, College of Business Administration, University of Illinois at Chicago, 601 S. Morgan Street (MC 294), Chigaco IL;Information and Decision Sciences, College of Business Administration, University of Illinois at Chicago, 601 S. Morgan Street (MC 294), Chicago, IL

  • Venue:
  • Information Sciences: an International Journal - Special issue: Soft computing data mining
  • Year:
  • 2004

Quantified Score

Hi-index 0.01

Visualization

Abstract

This study demonstrated potential of genetic programming (GP) as a base classifier algorithm in building ensembles in the context of large-scale data classification. An ensemble built upon base classifiers that were trained with GP was found to significantly outperform its counterparts built upon base classifiers that were trained with decision tree and logistic regression. The superiority of GP ensemble was partly attributed to the higher diversity, both in terms of the functional form of as well as with respect to the variables defining the models, among the base classifiers upon which it was built on. Implications of GP as a useful tool in other data mining problems, such as feature selection, were also discussed.