Contribution of Dataset Reduction Techniques to Tree-Simplification and Knowledge Discovery

  • Authors:
  • Marc Sebban;Richard Nock

  • Affiliations:
  • -;-

  • Venue:
  • PKDD '00 Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

In the Knowledge Discovery in Databases (KDD) field, the human comprehensibility of models is as important as the accuracy optimization. To address this problem, many methods have been proposed to simplify decision trees and improve their understandability. Among different classes of methods, we find strategies which deal with this problem by a priori reducing the database, either through feature selection or case selection. At the same time, many other efficient selection algorithms have been developed in order to reduce storage requirments of case-based learning algorithms. Therefore, their original aim is not the tree simplification. Surprisingly, as far as we know, few works have attempted to exploit this wealth of efficient algorithms in favor of knowledge discovery. This is the aim of this paper. we analyze through large experiments and discussions the contribution of the state-of-the-art reduction techniques and instances. We show that in some cases, this algorithms is very efficient to improve the standard post-pruning performances, used to combat the overfitting problem.