Decision Tree Instability and Active Learning

Authors:
Kenneth Dwyer;Robert Holte
Affiliations:
Department of Computing Science, University of Alberta, Edmonton AB, Canada;Department of Computing Science, University of Alberta, Edmonton AB, Canada
Venue:
ECML '07 Proceedings of the 18th European conference on Machine Learning
Year:
2007

Citing 12
Cited 2

Improving Generalization with Active Learning

Machine Learning - Special issue on structured connectionist systems
Technical Note: Bias and the Quantification of Stability

Machine Learning - Special issue on bias evaluation and selection
Bagging predictors

Machine Learning
Toward Optimal Active Learning through Sampling Estimation of Error Reduction

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Knowledge Acquisition form Examples Vis Multiple Models

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Query Learning Strategies Using Boosting and Bagging

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Exploiting the Cost (In)sensitivity of Decision Tree Splitting Criteria

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Active Sampling for Class Probability Estimation and Ranking

Machine Learning
Introduction to Machine Learning (Adaptive Computation and Machine Learning)

Introduction to Machine Learning (Adaptive Computation and Machine Learning)
Statistical Comparisons of Classifiers over Multiple Data Sets

The Journal of Machine Learning Research
Improved use of continuous attributes in C4.5

Journal of Artificial Intelligence Research
Consolidated trees: classifiers with stable explanation. a model to achieve the desired stability in explanation

ICAPR'05 Proceedings of the Third international conference on Advances in Pattern Recognition - Volume Part I

Practical Bias Variance Decomposition

AI '08 Proceedings of the 21st Australasian Joint Conference on Artificial Intelligence: Advances in Artificial Intelligence
Active learning for coreference resolution

NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Quantified Score

Hi-index	0.00

Visualization

Abstract

Decision tree learning algorithms produce accurate models that can be interpreted by domain experts. However, these algorithms are known to be unstable --- they can produce drastically different hypotheses from training sets that differ just slightly. This instability undermines the objective of extracting knowledge from the trees. In this paper, we study the instability of the C4.5 decision tree learner in the context of active learning. We introduce a new measure of decision tree stability, and define three aspects of active learning stability. Several existing active learning methods that use C4.5 as a component are compared empirically; it is determined that query-by-bagging yields trees that are more stable and accurate than those produced by competing methods. Also, an alternative splitting criterion, DKM, is found to improve the stability and accuracy of C4.5 in the active learning setting.