Constructing appropriate data abstractions for mining classification knowledge

Authors:
Yoshiaki Okubo;Yoshimitsu Kudoh;Makoto Haraguchi
Affiliations:
Division of Electronics and Information Engineering, Hokkaido University, Sapporo, Japan;Division of Electronics and Information Engineering, Hokkaido University, Sapporo, Japan;Division of Electronics and Information Engineering, Hokkaido University, Sapporo, Japan
Venue:
INAP'01 Proceedings of the Applications of prolog 14th international conference on Web knowledge management and decision support
Year:
2001

Citing 6
Cited 1

Algorithms for clustering data

Algorithms for clustering data
C4.5: programs for machine learning

C4.5: programs for machine learning
Attribute-oriented induction in data mining

Advances in knowledge discovery and data mining
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Some Criterions for Selecting the Best Data Abstractions

Progress in Discovery Science, Final Report of the Japanese Discovery Science Project
Cluster Analysis

Cluster Analysis

Data Abstractions for Numerical Attributes in Data Mining

IDEAL '02 Proceedings of the Third International Conference on Intelligent Data Engineering and Automated Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

A notion of data abstraction is very useful for discovering concise knowledge from large databases. For classification problems, we have previously proposed criterions for selecting useful abstractions from a set of given candidates and developed a family of data abstraction systems, called ITA, iterative ITA and I2TA [5,6,7]. In order to make our systems more flexible, this paper tries to construct useful abstractions from scratch. Since a data abstraction can be represented as a partition of possible attribute values, our search space for the construction consists of a huge number of possible candidates in general. In order to reduce the search space, we introduce an ordering on abstractions and present a pruning method based on the ordering. Furthermore, we propose to make use of hierarchical structure among attribute values extracted from a dictionary in order to reject meaningless candidates. Our search can be constrained by upper and lower-bounds extracted from the dictionary. Preliminary experimental results show that the number of candidates can be reduced drastically with the help of the dictionary.