Statistics, making sense of data
Statistics, making sense of data
International Journal of Man-Machine Studies - Special Issue: Knowledge Acquisition for Knowledge-based Systems. Part 5
Inferring decision trees using the minimum description length principle
Information and Computation
C4.5: programs for machine learning
C4.5: programs for machine learning
The Effects of Training Set Size on Decision Tree Complexity
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Understanding process and the quest for deeper questions in software engineering research
Proceedings of the 9th European software engineering conference held jointly with 11th ACM SIGSOFT international symposium on Foundations of software engineering
Experience in using a process language to define scientific workflow and generate dataset provenance
Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of software engineering
An analysis of reduced error pruning
Journal of Artificial Intelligence Research
Hi-index | 0.00 |
Recent empirical studies revealed two surprising pathologies of several common decision tree pruning algorithms. First, tree size is often a linear function of training set size, even when additional tree structure yields no increase in accuracy. Second, building trees with data in which the class label and the attributes are independent often results in large trees. In both cases, the pruning algorithms fail to control tree growth as one would expect them to. We explore thiS behavior theoretically by constructing a statistical model of reduced error pruning. The model explains why and when the pathologies occur, and makes predictions about how to lessen their effects. The predictions are operationalized in a variant of reduced error pruning that is shown to control tree growth far better than the original algorithm.