Some Criterions for Selecting the Best Data Abstractions

Authors:
Makoto Haraguchi;Yoshimitsu Kudoh
Affiliations:
-;-
Venue:
Progress in Discovery Science, Final Report of the Japanese Discovery Science Project
Year:
2002

Citing 4
Cited 1

C4.5: programs for machine learning

C4.5: programs for machine learning
Attribute-oriented induction in data mining

Advances in knowledge discovery and data mining
Detecting a Compact Decision Tree Based on an Appropriate Abstraction

IDEAL '00 Proceedings of the Second International Conference on Intelligent Data Engineering and Automated Learning, Data Mining, Financial Engineering, and Intelligent Agents
An Appropriate Abstraction for an Attribute-Oriented Induction

DS '99 Proceedings of the Second International Conference on Discovery Science

Constructing appropriate data abstractions for mining classification knowledge

INAP'01 Proceedings of the Applications of prolog 14th international conference on Web knowledge management and decision support

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents and summarizes some criterions for selecting the best data abstraction for relations in relational databases. The data abstraction can be understood as a grouping of attribute values whose individual aspects are forgotten and are therefore abstracted to some more abstract value together. Consequently, a relation after the abstraction is a more compact one for which data miners will work efficiently. It is however a major problem that, when an important aspect of data values is neglected in the abstraction, then the quality of extracted knowledge becomes worse. So, it is the central issue to present a criterion under which only an adequate data abstraction is selected so as to keep the important information and to reduce the sizes of relations at the same time. From this viewpoint, we present in this paper three criterions and test them for a task of classifying tuples in a relation given several target classes. All the criterions are derived from a notion of similarities among class distributions, and are formalized based on the standard information theory. We also summarize our experimental results for the classification task, and discuss a future work.