C4.5: programs for machine learning
C4.5: programs for machine learning
The relationship between recall and precision
Journal of the American Society for Information Science
Wrappers for feature subset selection
Artificial Intelligence - Special issue on relevance
Machine Learning for the Detection of Oil Spills in Satellite Radar Images
Machine Learning - Special issue on applications of machine learning and the knowledge discovery process
Explicitly representing expected cost: an alternative to ROC representation
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Learning and making decisions when costs and probabilities are both unknown
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Distributed Data Mining in Credit Card Fraud Detection
IEEE Intelligent Systems
Toward a Query Language on Simulation Mesh Data: An Object-oriented Approach
DASFAA '01 Proceedings of the 7th International Conference on Database Systems for Advanced Applications
Naive Bayes vs decision trees in intrusion detection systems
Proceedings of the 2004 ACM symposium on Applied computing
Editorial: special issue on learning from imbalanced data sets
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
A study of the behavior of several methods for balancing machine learning training data
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
The class imbalance problem: A systematic study
Intelligent Data Analysis
SMOTE: synthetic minority over-sampling technique
Journal of Artificial Intelligence Research
Learning when training data are costly: the effect of class distribution on tree induction
Journal of Artificial Intelligence Research
Ensembles of classifiers from spatially disjoint data
MCS'05 Proceedings of the 6th international conference on Multiple Classifier Systems
Automatically countering imbalance and its empirical relationship to cost
Data Mining and Knowledge Discovery
Cost-sensitive classifier evaluation using cost curves
PAKDD'08 Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining
A new probabilistic active sample selection algorithm for class imbalance problem
International Journal of Knowledge Engineering and Soft Data Paradigms
Hi-index | 0.00 |
Learning from imbalanced datasets presents an interesting problem both from modeling and economy standpoints. When the imbalance is large, classification accuracy on the smaller class(es) tends to be lower. In particular, when a class is of great interest but occurs relatively rarely such as cases of fraud, instances of disease, and regions of interest in largescale simulations, it is important to accurately identify it. It then becomes more costly to misclassify the interesting class. In this paper, we implement a wrapper approach that computes the amount of under-sampling and synthetic generation of the minority class examples (SMOTE) to improve minority class accuracy. The f-value serves as the evaluation function. Experimental results show the wrapper approach is effective in optimization of the composite f-value, and reduces the average cost per test example for the datasets considered. We report both average cost per test example and the cost curves in the paper. The true positive rate of the minority class increases significantly without causing a significant change in the f-value. We also obtain the lowest cost per test example, compared to any result we are aware of for the KDD Cup-99 intrusion detection data set.