C4.5: programs for machine learning
C4.5: programs for machine learning
PUBLIC: A Decision Tree Classifier that Integrates Building and Pruning
Data Mining and Knowledge Discovery
SLIQ: A Fast Scalable Classifier for Data Mining
EDBT '96 Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology
SPRINT: A Scalable Parallel Classifier for Data Mining
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Introduction to Data Mining, (First Edition)
Introduction to Data Mining, (First Edition)
A scalable decision tree system and its application in pattern recognition and intrusion detection
Decision Support Systems
SMOTE: synthetic minority over-sampling technique
Journal of Artificial Intelligence Research
Hi-index | 0.00 |
Because decision trees are built to cover all training instances with minimal errors, it is true that the instances that belong to minor classes are treated less importantly in classification. As a result, the classification accuracy for minor classes is usually poorer than that of major classes. But we hope that the classification is also good for the minor classes. This paper suggests to use over-sampling for minor classes to generate more accurate trees for minor classes, and use decision trees with conventional sampling method as well as decision trees with the over sampling method together for better classification. Experiments with a representative decision tree algorithm, C4.5, shows very promising results.