On the Optimality of the Simple Bayesian Classifier under Zero-One Loss
Machine Learning - Special issue on learning with probabilistic representations
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Chi2: Feature Selection and Discretization of Numeric Attributes
TAI '95 Proceedings of the Seventh International Conference on Tools with Artificial Intelligence
Benchmarking Attribute Selection Techniques for Discrete Class Data Mining
IEEE Transactions on Knowledge and Data Engineering
Microarray data mining: facing the challenges
ACM SIGKDD Explorations Newsletter
Class Noise vs. Attribute Noise: A Quantitative Study
Artificial Intelligence Review
Toward Integrating Feature Selection Algorithms for Classification and Clustering
IEEE Transactions on Knowledge and Data Engineering
Bias Analysis in Text Classification for Highly Skewed Data
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Experimental perspectives on learning from imbalanced data
Proceedings of the 24th international conference on Machine learning
On the Class Imbalance Problem
ICNC '08 Proceedings of the 2008 Fourth International Conference on Natural Computation - Volume 04
IEEE Transactions on Knowledge and Data Engineering
SMOTE: synthetic minority over-sampling technique
Journal of Artificial Intelligence Research
The foundations of cost-sensitive learning
IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
A study of cross-validation and bootstrap for accuracy estimation and model selection
IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
Knowledge discovery from imbalanced and noisy data
Data & Knowledge Engineering
The WEKA data mining software: an update
ACM SIGKDD Explorations Newsletter
A Comparative Study of Threshold-Based Feature Selection Techniques
GRC '10 Proceedings of the 2010 IEEE International Conference on Granular Computing
Attribute Selection and Imbalanced Data: Problems in Software Defect Prediction
ICTAI '10 Proceedings of the 2010 22nd IEEE International Conference on Tools with Artificial Intelligence - Volume 01
Hi-index | 0.00 |
Two problems often encountered in machine learning are class imbalance and high dimensionality. In this paper we compare three different approaches for addressing both problems simultaneously, by applying both data sampling and feature selection. With the first two approaches, sampling is followed by feature selection. In the first approach, the features are selected based on the sampled data, and then the unsampled data is used with just the selected features. The second approach is similar, but the sampled data is used. Finally, with the third approach, feature selection is performed prior to sampling. To compare the approaches, we use seven datasets from different domains, employ nine feature rankers from three different families, apply three sampling techniques, and inject class noise to better simulate real-world datasets. The results show that the second and third approaches are both very good, with the third approach showing a slight (but not statistically significant) lead.