C4.5: programs for machine learning
C4.5: programs for machine learning
Machine learning, neural and statistical classification
Machine learning, neural and statistical classification
MetaCost: a general method for making classifiers cost-sensitive
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Machine Learning
Editorial: special issue on learning from imbalanced data sets
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Decision trees with minimal costs
ICML '04 Proceedings of the twenty-first international conference on Machine learning
The class imbalance problem: A systematic study
Intelligent Data Analysis
On multi-class cost-sensitive learning
AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
Learning when training data are costly: the effect of class distribution on tree induction
Journal of Artificial Intelligence Research
Multi-agent based classification using argumentation from experience
PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part II
Large scale visual classification with many classes
MLDM'13 Proceedings of the 9th international conference on Machine Learning and Data Mining in Pattern Recognition
Editorial: Parameter-free classification in multi-class imbalanced data sets
Data & Knowledge Engineering
Hi-index | 0.00 |
In data mining, large differences in prior class probabilities known as the class imbalance problem have been reported to hinder the performance of classifiers such as decision trees. Dealing with imbalanced and cost-sensitive data has been recognized as one of the 10 most challenging problems in data mining research. In decision trees learning, many measures are based on the concept of Shannon's entropy. A major characteristic of the entropies is that they take their maximal value when the distribution of the modalities of the class variable is uniform. To deal with the class imbalance problem, we proposed an off-centered entropy which takes its maximum value for a distribution fixed by the user. This distribution can be the a priori distribution of the class variable modalities or a distribution taking into account the costs of misclassification. Others authors have proposed an asymmetric entropy. In this paper we present the concepts of the three entropies and compare their effectiveness on 20 imbalanced data sets. All our experiments are founded on the C4.5 decision trees algorithm, in which only the function of entropy is modified. The results are promising and show the interest of off-centered entropies to deal with the problem of class imbalance.