k-norm misclassification rate estimation for decision trees

Authors:
Mingyu Zhong;Michael Georgiopoulos;Georgios C. Anagnostopoulos
Affiliations:
University of Central Florida, Orlando, FL;University of Central Florida, Orlando, FL;Florida Institute of Technology, Melbourne, FL
Venue:
ASC '07 Proceedings of The Eleventh IASTED International Conference on Artificial Intelligence and Soft Computing
Year:
2007

Citing 11
Cited 0

Learning decision rules in noisy domains

Proceedings of Expert Systems '86, The 6Th Annual Technical Conference on Research and development in expert systems III
On estimating probabilities in tree pruning

EWSL-91 Proceedings of the European working session on learning on Machine learning
C4.5: programs for machine learning

C4.5: programs for machine learning
A Comparative Analysis of Methods for Pruning Decision Trees

IEEE Transactions on Pattern Analysis and Machine Intelligence
Self bounding learning algorithms

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Simplifying decision trees

International Journal of Human-Computer Studies - Special issue: 1969-1999, the 30th anniversary
Pessimistic decision tree pruning based Continuous-time

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
A Fast, Bottom-Up Decision Tree Pruning Algorithm with Near-Optimal Generalization

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Generalization Bounds for Decision Trees

COLT '00 Proceedings of the Thirteenth Annual Conference on Computational Learning Theory
Experiments with an innovative tree pruning algorithm

AIAP'07 Proceedings of the 25th conference on Proceedings of the 25th IASTED International Multi-Conference: artificial intelligence and applications
Laplace's law of succession and universal encoding

IEEE Transactions on Information Theory

Quantified Score

Hi-index	0.00

Visualization

Abstract

The decision tree classifier is a well-known methodology for classification. It is widely accepted that a fully grown tree is usually over-fit to the training data and thus should be pruned back. In this paper, we analyze the overtraining issue theoretically using an the k-norm risk estimation approach with Lidstone's Estimate. Our analysis allows the deeper understanding of decision tree classifiers, especially on how to estimate their misclassification rates using our equations. We propose a simple pruning algorithm based on our analysis and prove its superior properties, including its independence from validation and its efficiency.