A Combination Classification Algorithm Based on Outlier Detection and C4.5

Authors:
Shengyi Jiang;Wen Yu
Affiliations:
School of Informatics, Guangdong University of Foreign Studies, Guangzhou 510006;School of Informatics, Guangdong University of Foreign Studies, Guangzhou 510006
Venue:
ADMA '09 Proceedings of the 5th International Conference on Advanced Data Mining and Applications
Year:
2009

Citing 6
Cited 1

Mining with rarity: a unifying framework

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Learning from imbalanced data sets with boosting and data generation: the DataBoost-IM approach

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
A clustering-based method for unsupervised intrusion detections

Pattern Recognition Letters
Exploratory undersampling for class-imbalance learning

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning

ICIC'05 Proceedings of the 2005 international conference on Advances in Intelligent Computing - Volume Part I
A Kernel-Based Two-Class Classifier for Imbalanced Data Sets

IEEE Transactions on Neural Networks

Input space reduction for rule based classification

WSEAS Transactions on Information Science and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

The performance of traditional classifier skews towards the majority class for imbalanced data, resulting in high misclassification rate for minority samples. To solve this problem, a combination classification algorithm based on outlier detection and C4.5 is presented. The basic idea of the algorithm is to make the data distribution balance by grouping the whole data into rare clusters and major clusters through the outlier factor. Then C4.5 algorithm is implemented to build the decision trees on both the rare clusters and the major clusters respectively. When classifying a new object, the decision tree for evaluation will be chosen according to the type of the cluster which the new object is nearest. We use the datasets from the UCI Machine Learning Repository to perform the experiments and compare the effects with other classification algorithms; the experiments demonstrate that our algorithm performs much better for the extremely imbalanced data sets.