Genetic learner: Discretization and fuzzification of numerical attributes

  • Authors:
  • Ivan Bruha;Pavel Kralik;Petr Berka

  • Affiliations:
  • McMaster University, Department Computing & Software, Hamilton, Ont., Canada L8S 4L7. E-mail: bruha@mcmaster.ca/ URL: http://www.cas.mcmaster.ca/~bruha;Technical University of Brno, Department Automation and Information Technology, Technicka 2, Brno, CZ-61669, Czech Republic. E-mail: kralik@vertigo.fme.vutbr.cz;Prague University of Economics, Laboratory of Intelligent Systems, Prague, CZ-13067, Czech Republic. E-mail: berka@vse.cz

  • Venue:
  • Intelligent Data Analysis
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

Machine learning (ML) is a useful and productive component of data mining (DM). Given a large database, a learning algorithm induces a description of concepts (classes) which are immersed in a given problem area. The induction itself consists in searching usually a huge space of possible concept descriptions. There exist several paradigms for controlling this search. One of the promising and efficient paradigms are {\it genetic algorithms} (GAs). There have been done many research projects of incorporating genetic algorithms into the field of machine learning. This paper describes an efficient application of a GA in the attribute-based rule-inducing learning algorithm. Actually, a domain-independent GA has been integrated into the covering learning algorithm CN4, a large extension of the well-known algorithm CN2; the induction procedure of CN4 (beam search methodology) has been removed and the GA has been implanted into this shell. Genetic algorithms are capable of processing symbolic attributes in a simple, natural manner. The processing of numerical (continuous) attributes by genetic algorithms is not so straightforward. One feasible strategy is to discretize numerical attributes before a generic algorithm is called. There exist quite a few discretization preprocessors in data mining and machine learning. This paper describes a newer preprocessor for discretization (categorization) of numerical attributes. The genuine discretization procedures generate sharp bounds (thresholds) between intervals. It may result in capturing training objects from various classes (concepts) into one interval that will not be `pure'; this in particular happens near the interval borders. One feasible way how to eliminate such an impurity around the interval borders is to fuzzify them. The paper first introduces the methodology of our new learning algorithm, the genetic learner. Then the discretization/fuzzification preprocessor is presented. Finally, the paper compares the entire system (a preprocessor and genetic learner) with well-known covering as well as TDIDT learning algorithms.