An Appropriate Abstraction for Construction a Compact Decision Tree

  • Authors:
  • Yoshimitsu Kudoh;Makoto Haraguchi

  • Affiliations:
  • -;-

  • Venue:
  • DS '00 Proceedings of the Third International Conference on Discovery Science
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

In general, it is considered that pre-processings for data mining are necessary techniques to remove irrelevant and meaningless aspects of data before applying data mining algorithms. From this viewpoint, we have considered pre-processing for detecting a decision tree, and already proposed a notion of Information Theoretical Abstraction, and implemented a system ITA. Given a relational database and a family of possible abstractions for its attribute values, called an abstraction hierarchy, our system ITA selects the best abstraction among the possible ones so that class distributions needed to perform our classification task are preserved, and generalizes database according to the best abstraction. According to our previous experiment, just one application of abstraction for the whole database has shown its effectiveness in reducing the size of the detected decision tree, without making the classification accuracy worse. However, since such classification systems as C4.5 perform serial attribute-selection repeatedly, ITA does not generally guarantee the preservingness of class distributions, given a sequence of attribute-selections. For this reason, in this paper, we propose a new version of ITA, called iterative ITA, so that it tries to keep the class distributions in each attribute selection step as possibly as we can.