Improving the ranking performance of decision trees

Authors:
Bin Wang;Harry Zhang
Affiliations:
Faculty of Computer Science, University of New Brunswick, Canada;Faculty of Computer Science, University of New Brunswick, Canada
Venue:
ECML'06 Proceedings of the 17th European conference on Machine Learning
Year:
2006

Citing 10
Cited 3

C4.5: programs for machine learning

C4.5: programs for machine learning
Data mining: practical machine learning tools and techniques with Java implementations

Data mining: practical machine learning tools and techniques with Java implementations
A Simple Generalisation of the Area Under the ROC Curve for Multiple Class Classification Problems

Machine Learning
An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants

Machine Learning
Obtaining calibrated probability estimates from decision trees and naive Bayesian classifiers

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Improving Text Classification by Shrinkage in a Hierarchy of Classes

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
The Case against Accuracy Estimation for Comparing Induction Algorithms

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Tree Induction for Probability-Based Ranking

Machine Learning
AUC: a statistically consistent and more discriminating measure than accuracy

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
The use of the area under the ROC curve in the evaluation of machine learning algorithms

Pattern Recognition

An Empirical Comparison of Probability Estimation Techniques for Probabilistic Rules

DS '09 Proceedings of the 12th International Conference on Discovery Science
Why fuzzy decision trees are good rankers

IEEE Transactions on Fuzzy Systems
Exploring an improved decision tree based weights

ICNC'09 Proceedings of the 5th international conference on Natural computation

Quantified Score

Hi-index	0.00

Visualization

Abstract

An accurate ranking of instances based on their class probabilities, which is measured by AUC (area under the Receiver Operating Characteristics curve), is desired in many applications. In a traditional decision tree, two obstacles prevent it from yielding accurate rankings: one is that the sample size on a leaf is small, and the other is that the instances falling into the same leaf are assigned to the same class probability. In this paper, we propose two techniques to address these two issues. First, we use the statistical technique shrinkage which estimates the class probability of a test instance by using a linear interpolation of the local class probabilities on each node along the path from leaf to root. An efficient algorithm is also brought forward to learn the interpolating weights. Second, we introduce an instance-based method, the weighted probability estimation (WPE), to generate distinct local probability estimates for the test instances falling into the same leaf. The key idea is to assign different weights to training instances based on their similarities to the test instance in probability estimation. Furthermore, we combine shrinkage and WPE together to compensate for the defects of each. Our experiments show that both shrinkage and WPE improve the ranking performance of decision trees, and that their combination works even better. The experiments also indicate that various decision tree algorithms with the combination of shrinkage and WPE significantly outperform the original ones and other state-of-the-art techniques proposed to enhance the ranking performance of decision trees.