Evaluating decision trees grown with asymmetric entropies

Authors:
Simon Marcellin;Djamel A. Zighed;Gilbert Ritschard
Affiliations:
Université Lumière Lyon 2, Laboratoire ERIC, Bron, France;Université Lumière Lyon 2, Laboratoire ERIC, Bron, France;Université de Genève, Département d'économétrie, Geneva 4, Switzerland
Venue:
ISMIS'08 Proceedings of the 17th international conference on Foundations of intelligent systems
Year:
2008

Citing 4
Cited 0

C4.5: programs for machine learning

C4.5: programs for machine learning
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Test-Cost Sensitive Naive Bayes Classification

ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
An introduction to ROC analysis

Pattern Recognition Letters - Special issue: ROC analysis in pattern recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose to evaluate the quality of decision trees grown on imbalanced datasets with a splitting criterion based on an asymmetric entropy measure. To deal with the class imbalance problem in machine learning, especially with decision trees, different authors proposed such asymmetric splitting criteria. After the tree is grown a decision rule has to be assigned to each leaf. The classical Bayesian rule that selects the more frequent class is irrelevant when the dataset is strongly imbalanced. A best suited assignment rule taking asymmetry into account must be adopted. But how can we then evaluate the resulting prediction model? Indeed the usual error rate is irrelevant when the classes are strongly imbalanced. Appropriate evaluation measures are required in such cases. We consider ROC curves and recall/precision graphs for evaluating the performance of decision trees grown from imbalanced datasets. These evaluation criteria are used for comparing trees obtained with an asymmetric splitting criterion with those grown with a symmetric one. In this paper we only consider the cases involving 2 classes.