C4.5: programs for machine learning
C4.5: programs for machine learning
Advances in knowledge discovery and data mining
Advances in knowledge discovery and data mining
Attribute-oriented induction in data mining
Advances in knowledge discovery and data mining
An Appropriate Abstraction for an Attribute-Oriented Induction
DS '99 Proceedings of the Second International Conference on Discovery Science
Hi-index | 0.00 |
In general, it is considered that pre-processings for data mining are necessary techniques to remove irrelevant and meaningless aspects of data before applying data mining algorithms. From this viewpoint, we have considered pre-processing for detecting a decision tree, and already proposed a notion of Information Theoretical Abstraction, and implemented a system ITA. Given a relational database and a family of possible abstractions for its attribute values, called an abstraction hierarchy, our system ITA selects the best abstraction among the possible ones so that class distributions needed to perform our classification task are preserved, and generalizes database according to the best abstraction. According to our previous experiment, just one application of abstraction for the whole database has shown its effectiveness in reducing the size of the detected decision tree, without making the classification accuracy worse. However, since such classification systems as C4.5 perform serial attribute-selection repeatedly, ITA does not generally guarantee the preservingness of class distributions, given a sequence of attribute-selections. For this reason, in this paper, we propose a new version of ITA, called iterative ITA, so that it tries to keep the class distributions in each attribute selection step as possibly as we can.