Contribution of Dataset Reduction Techniques to Tree-Simplification and Knowledge Discovery

Authors:
Marc Sebban;Richard Nock
Affiliations:
-;-
Venue:
PKDD '00 Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery
Year:
2000

Citing 9
Cited 0

C4.5: programs for machine learning

C4.5: programs for machine learning
On the boosting ability of top-down decision tree learning algorithms

STOC '96 Proceedings of the twenty-eighth annual ACM symposium on Theory of computing
Improved boosting algorithms using confidence-rated predictions

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Induction of Decision Trees

Machine Learning
Instance Pruning Techniques

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Instance Pruning as an Information Preserving Problem

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Combining Feature and Example Pruning by Uncertainty Minimization

UAI '00 Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence
Simplifying decision trees: A survey

The Knowledge Engineering Review
Identifying and eliminating mislabeled training instances

AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 1

Quantified Score

Hi-index	0.00

Visualization

Abstract

In the Knowledge Discovery in Databases (KDD) field, the human comprehensibility of models is as important as the accuracy optimization. To address this problem, many methods have been proposed to simplify decision trees and improve their understandability. Among different classes of methods, we find strategies which deal with this problem by a priori reducing the database, either through feature selection or case selection. At the same time, many other efficient selection algorithms have been developed in order to reduce storage requirments of case-based learning algorithms. Therefore, their original aim is not the tree simplification. Surprisingly, as far as we know, few works have attempted to exploit this wealth of efficient algorithms in favor of knowledge discovery. This is the aim of this paper. we analyze through large experiments and discussions the contribution of the state-of-the-art reduction techniques and instances. We show that in some cases, this algorithms is very efficient to improve the standard post-pruning performances, used to combat the overfitting problem.