The Use of Background Knowledge in Decision Tree Induction
Machine Learning
A method for inductive cost optimization
EWSL-91 Proceedings of the European working session on learning on Machine learning
A practical approach to feature selection
ML92 Proceedings of the ninth international workshop on Machine learning
Estimating attributes: analysis and extensions of RELIEF
ECML-94 Proceedings of the European conference on machine learning on Machine Learning
Floating search methods in feature selection
Pattern Recognition Letters
MetaCost: a general method for making classifiers cost-sensitive
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Data Mining and Knowledge Discovery
Learning When Negative Examples Abound
ECML '97 Proceedings of the 9th European Conference on Machine Learning
Feature Selection for Unbalanced Class Distribution and Naive Bayes
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
An improved branch and bound algorithm for feature selection
Pattern Recognition Letters
An introduction to variable and feature selection
The Journal of Machine Learning Research
An extensive empirical study of feature selection metrics for text classification
The Journal of Machine Learning Research
Mining with rarity: a unifying framework
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Extreme re-balancing for SVMs: a case study
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Feature selection for text categorization on imbalanced data
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Efficient Feature Selection via Analysis of Relevance and Redundancy
The Journal of Machine Learning Research
Immunological Bioinformatics (Computational Molecular Biology)
Immunological Bioinformatics (Computational Molecular Biology)
The relationship between Precision-Recall and ROC curves
ICML '06 Proceedings of the 23rd international conference on Machine learning
Boosting for Learning Multiple Classes with Imbalanced Class Distribution
ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Minimum reference set based feature selection for small sample classifications
Proceedings of the 24th international conference on Machine learning
SMOTE: synthetic minority over-sampling technique
Journal of Artificial Intelligence Research
The foundations of cost-sensitive learning
IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
Learning classifiers from imbalanced data based on biased minimax probability machine
CVPR'04 Proceedings of the 2004 IEEE computer society conference on Computer vision and pattern recognition
Feature selection with biased sample distributions
IRI'09 Proceedings of the 10th IEEE international conference on Information Reuse & Integration
An effective feature selection method for text categorization
PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part I
A minority class feature selection method
CIARP'11 Proceedings of the 16th Iberoamerican Congress conference on Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications
Feature selection for MAUC-oriented classification systems
Neurocomputing
Feature selection for optimizing traffic classification
Computer Communications
Feature selection for high-dimensional imbalanced data
Neurocomputing
Comparison of text feature selection policies and using an adaptive framework
Expert Systems with Applications: An International Journal
Cost-sensitive decision tree ensembles for effective imbalanced classification
Applied Soft Computing
Hi-index | 0.00 |
The class imbalance problem is encountered in a large number of practical applications of machine learning and data mining, for example, information retrieval and filtering, and the detection of credit card fraud. It has been widely realized that this imbalance raises issues that are either nonexistent or less severe compared to balanced class cases and often results in a classifier's suboptimal performance. This is even more true when the imbalanced data are also high dimensional. In such cases, feature selection methods are critical to achieve optimal performance. In this paper, we propose a new feature selection method, Feature Assessment by Sliding Thresholds (FAST), which is based on the area under a ROC curve generated by moving the decision boundary of a single feature classifier with thresholds placed using an even-bin distribution. FAST is compared to two commonly-used feature selection methods, correlation coefficient and RELevance In Estimating Features (RELIEF), for imbalanced data classification. The experimental results obtained on text mining, mass spectrometry, and microarray data sets showed that the proposed method outperformed both RELIEF and correlation methods on skewed data sets and was comparable on balanced data sets; when small number of features is preferred, the classification performance of the proposed method was significantly improved compared to correlation and RELIEF-based methods.