Effective search for genetic-based machine learning systems via estimation of distribution algorithms and embedded feature reduction techniques

  • Authors:
  • Jiadong Yang;Hua Xu;Peifa Jia

  • Affiliations:
  • Jike.com, Beijing 100020, China;State Key Laboratory of Intelligent Technology and Systems, Tsinghua National Laboratory for Information Science and Technology, Department of Computer Science and Technology, Tsinghua University, ...;State Key Laboratory of Intelligent Technology and Systems, Tsinghua National Laboratory for Information Science and Technology, Department of Computer Science and Technology, Tsinghua University, ...

  • Venue:
  • Neurocomputing
  • Year:
  • 2013

Quantified Score

Hi-index 0.01

Visualization

Abstract

Genetic-based machine learning (GBML) systems, which employ evolutionary algorithms (EAs) as search mechanisms, evolve rule-based classification models to represent target concepts. Compared to Michigan-style GBML, Pittsburgh-style GBML is expected to achieve more compact solutions. It has been shown that standard recombination operators in EAs do not assure an effective evolutionary search to solve sophisticated problems that contain strong interactions between features. On the other hand, when dealing with real-world classification tasks, irrelevant features not only complicate the problem but also incur unnecessary matchings in GBML systems, which increase the computational cost a lot. To handle the two problems mentioned above in an integrated manner, a new Pittsburgh-style GBML system is proposed. In the proposed method, classifiers are generated and recombined at two levels. At the high level, classifiers are recombined by rule-wise uniform crossover operators since each classifier consists of a variable-size rule set. At the low level, single rules contained in classifiers are reproduced via sampling Bayesian networks that characterize the global statistical information extracted from promising rules found so far. Furthermore, according to the statistical information in the rule population, an embedded approach is presented to detect and remove redundant features incrementally following the evolution of rule population. Results of empirical evaluation show that the proposed method outperforms the original Pittsburgh-style GBML system in terms of classification accuracy while reducing the computational cost. Furthermore, the proposed method is also competitive to other non-evolutionary, highly used machine learning methods. With respect to the performance of feature reduction, the proposed embedded approach is able to deliver solutions with higher classification accuracy when removing the same number of features as other feature reduction techniques do.