Genetic wrappers for feature selection in decision tree induction and variable ordering in Bayesian network structure learning

  • Authors:
  • William H. Hsu

  • Affiliations:
  • Department of Computing and Information Sciences, Kansas State University, Manhattan, KS and National Center for Supercomputing Applications (NCSA), University of Illinois, Champaign, IL

  • Venue:
  • Information Sciences: an International Journal - Special issue: Soft computing data mining
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we address the automated tuning of input specification for supervised inductive learning and develop combinatorial optimization solutions for two such tuning problems. First, we present a framework for selection and reordering of input variables to reduce generalization error in classification and probabilistic inference. One purpose of selection is to control overfitting using validation set accuracy as a criterion for relevance. Similarly, some inductive learning algorithms, such as greedy algorithms for learning probabilistic networks, are sensitive to the evaluation order of variables. We design a generic fitness function for validation of input specification, then use it to develop two genetic algorithm wrappers: one for the variable selection problem for decision tree inducers and one for the variable ordering problem for Bayesian network structure learning. We evaluate the wrappers, using real-world data for the selection wrapper and synthetic data for both, and discuss their limitations and generalizability to other inducers.