The Strength of Weak Learnability
Machine Learning
Introduction to statistical pattern recognition (2nd ed.)
Introduction to statistical pattern recognition (2nd ed.)
Small Sample Size Effects in Statistical Pattern Recognition: Recommendations for Practitioners
IEEE Transactions on Pattern Analysis and Machine Intelligence
Machine Learning
MetaCost: a general method for making classifiers cost-sensitive
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Magical thinking in data mining: lessons from CoIL challenge 2000
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Machine Learning
Data Mining and Knowledge Discovery
Learning When Negative Examples Abound
ECML '97 Proceedings of the 9th European Conference on Machine Learning
Feature Selection for Unbalanced Class Distribution and Naive Bayes
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
AdaCost: Misclassification Cost-Sensitive Boosting
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
A Comparative Study of Cost-Sensitive Boosting Algorithms
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Evaluating Boosting Algorithms to Classify Rare Classes: Comparison and Improvements
ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
A Case Study for Learning from Imbalanced Data Sets
AI '01 Proceedings of the 14th Biennial Conference of the Canadian Society on Computational Studies of Intelligence: Advances in Artificial Intelligence
Improving Identification of Difficult Small Classes by Balancing Class Distribution
AIME '01 Proceedings of the 8th Conference on AI in Medicine in Europe: Artificial Intelligence Medicine
One-class svms for document classification
The Journal of Machine Learning Research
An introduction to variable and feature selection
The Journal of Machine Learning Research
An extensive empirical study of feature selection metrics for text classification
The Journal of Machine Learning Research
Pattern Classification (2nd Edition)
Pattern Classification (2nd Edition)
Editorial: special issue on learning from imbalanced data sets
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Mining with rarity: a unifying framework
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
A study of the behavior of several methods for balancing machine learning training data
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Class imbalances versus small disjuncts
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Extreme re-balancing for SVMs: a case study
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Feature selection for text categorization on imbalanced data
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Efficient Feature Selection via Analysis of Relevance and Redundancy
The Journal of Machine Learning Research
Pattern Recognition and Machine Learning (Information Science and Statistics)
Pattern Recognition and Machine Learning (Information Science and Statistics)
One-class document classification via Neural Networks
Neurocomputing
Statistical Comparisons of Classifiers over Multiple Data Sets
The Journal of Machine Learning Research
Boosted Classification Trees and Class Probability/Quantile Estimation
The Journal of Machine Learning Research
Cost-sensitive boosting for classification of imbalanced data
Pattern Recognition
Feature Extraction and Uncorrelated Discriminant Analysis for High-Dimensional Data
IEEE Transactions on Knowledge and Data Engineering
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Learning classifiers from only positive and unlabeled data
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Selective Pre-processing of Imbalanced Data for Improving Classification Performance
DaWaK '08 Proceedings of the 10th international conference on Data Warehousing and Knowledge Discovery
Feature selection with dynamic mutual information
Pattern Recognition
IEEE Transactions on Knowledge and Data Engineering
A General Framework of Feature Selection for Text Categorization
MLDM '09 Proceedings of the 6th International Conference on Machine Learning and Data Mining in Pattern Recognition
SMOTE: synthetic minority over-sampling technique
Journal of Artificial Intelligence Research
The foundations of cost-sensitive learning
IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
Knowledge discovery from imbalanced and noisy data
Data & Knowledge Engineering
MSMOTE: Improving Classification Performance When Training Data is Imbalanced
IWCSE '09 Proceedings of the 2009 Second International Workshop on Computer Science and Engineering - Volume 02
An empirical study of the behavior of classifiers on imbalanced and overlapped data sets
CIARP'07 Proceedings of the Congress on pattern recognition 12th Iberoamerican conference on Progress in pattern recognition, image analysis and applications
An asymmetric classifier based on partial least squares
Pattern Recognition
Combating the Small Sample Class Imbalance Problem Using Feature Selection
IEEE Transactions on Knowledge and Data Engineering
Boosting support vector machines for imbalanced data sets
Knowledge and Information Systems
Integrating selective pre-processing of imbalanced data with Ivotes ensemble
RSCTC'10 Proceedings of the 7th international conference on Rough sets and current trends in computing
Learning from imbalanced data in presence of noisy and borderline examples
RSCTC'10 Proceedings of the 7th international conference on Rough sets and current trends in computing
Learning classifiers from imbalanced data based on biased minimax probability machine
CVPR'04 Proceedings of the 2004 IEEE computer society conference on Computer vision and pattern recognition
The novelty detection approach for different degrees of class imbalance
ICONIP'06 Proceedings of the 13th international conference on Neural Information Processing - Volume Part II
Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning
ICIC'05 Proceedings of the 2005 international conference on Advances in Intelligent Computing - Volume Part I
Balancing strategies and class overlapping
IDA'05 Proceedings of the 6th international conference on Advances in Intelligent Data Analysis
RUSBoost: A Hybrid Approach to Alleviating Class Imbalance
IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans
GAB-EPA: a GA based ensemble pruning approach to tackle multiclass imbalanced problems
ACIIDS'13 Proceedings of the 5th Asian conference on Intelligent Information and Database Systems - Volume Part I
Hi-index | 0.00 |
Nowadays, imbalanced data sets are pervasive in real world human practices, and hence, become a very interesting research area within machine learning communities. Imbalanced data sets introduce a significant reduction in performance of standard classifiers when they are invoked to learn data underlying concepts. The problem becomes even more sever when imbalanced data sets are involved with high dimensions. This paper presents a novel feature ranking approach based on the probability density estimation to cope with these issues. The idea behind our approach, named Density Based Feature Selection (DBFS), is that features' distributions over classes can bring significant benefits to feature selection algorithms. In other words, to explore the contribution of each attribute and assign it an appropriate rank, DBFS takes into account features' corresponding distributions over all classes along with their correlations. To show the effectiveness of the presented approach, well-known feature ranking methods are implemented and compared with our approach across varieties of small sample size and high dimensional data sets from microarray, mass spectrometry and text mining domains. Our theoretical analysis and experimental observations reveal that our approach is the method of choice by offering a simple yet effective feature ranking method based on well-known statistical evaluation measures.