Detecting a Compact Decision Tree Based on an Appropriate Abstraction

Authors:
Yoshimitsu Kudoh;Makoto Haraguchi
Affiliations:
-;-
Venue:
IDEAL '00 Proceedings of the Second International Conference on Intelligent Data Engineering and Automated Learning, Data Mining, Financial Engineering, and Intelligent Agents
Year:
2000

Citing 7
Cited 2

C4.5: programs for machine learning

C4.5: programs for machine learning
Data mining

Data mining
Advances in knowledge discovery and data mining

Advances in knowledge discovery and data mining
From data mining to knowledge discovery: an overview

Advances in knowledge discovery and data mining
Attribute-oriented induction in data mining

Advances in knowledge discovery and data mining
Machine Learning and Data Mining; Methods and Applications

Machine Learning and Data Mining; Methods and Applications
Architectural Support for Data Mining.

Architectural Support for Data Mining.

Data Abstractions for Numerical Attributes in Data Mining

IDEAL '02 Proceedings of the Third International Conference on Intelligent Data Engineering and Automated Learning
Some Criterions for Selecting the Best Data Abstractions

Progress in Discovery Science, Final Report of the Japanese Discovery Science Project

Quantified Score

Hi-index	0.00

Visualization

Abstract

It is generally convinced that pre-processing for data mining is needed to exclude irrelevant and meaningless aspects of data before applying data mining algorithms. From this viewpoint, we have already proposcd a notion of Information Theoretical Abstraction, and implemented a system ITA. Given a relational database and a family of possible abstractions for its attribute values, called an anstraction hierarchy, ITA selects the best abstraction among the possible ones so that class disatribution needed to perform our classification task arc preserved as possibly as we can. According to our previous experiment, just one application of abstraction for the whole database has shown its effectiveness in reducing the size of detected rules, without making the classification error worse. However, as C4.5 performs serial attribute-selection repeatedly, ITA does not generally guarantee the preservingness of class distributions, given a sequence of attribute-selections. For this reason, in this paper, we propose a new version of ITA, called iterntizie ITA, so that it tries to keep the class distributions in each attribute selection step as possibly as we call.