Machine Learning
MetaCost: a general method for making classifiers cost-sensitive
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Learning and making decisions when costs and probabilities are both unknown
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Complexity Measures of Supervised Classification Problems
IEEE Transactions on Pattern Analysis and Machine Intelligence
An Instance-Weighting Method to Induce Cost-Sensitive Trees
IEEE Transactions on Knowledge and Data Engineering
Pruning Decision Trees with Misclassification Costs
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Improving Identification of Difficult Small Classes by Balancing Class Distribution
AIME '01 Proceedings of the 8th Conference on AI in Medicine in Europe: Artificial Intelligence Medicine
Choosing k for two-class nearest neighbour classifiers with unbalanced classes
Pattern Recognition Letters
Cost-Sensitive Learning by Cost-Proportionate Example Weighting
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Machine Learning
Editorial: special issue on learning from imbalanced data sets
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
A study of the behavior of several methods for balancing machine learning training data
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Decision trees with minimal costs
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Using AUC and Accuracy in Evaluating Learning Algorithms
IEEE Transactions on Knowledge and Data Engineering
KBA: Kernel Boundary Alignment Considering Imbalanced Data Distribution
IEEE Transactions on Knowledge and Data Engineering
Training Cost-Sensitive Neural Networks with Methods Addressing the Class Imbalance Problem
IEEE Transactions on Knowledge and Data Engineering
Statistical Comparisons of Classifiers over Multiple Data Sets
The Journal of Machine Learning Research
Cost-sensitive boosting for classification of imbalanced data
Pattern Recognition
The class imbalance problem: A systematic study
Intelligent Data Analysis
An information granulation based data mining approach for classifying imbalanced data
Information Sciences: an International Journal
Covariate Shift Adaptation by Importance Weighted Cross Validation
The Journal of Machine Learning Research
Automatically countering imbalance and its empirical relationship to cost
Data Mining and Knowledge Discovery
On the k-NN performance in a challenging scenario of imbalance and overlapping
Pattern Analysis & Applications - Special Issue: Non-parametric distance-based classification techniques and their applications
Conceptual equivalence for contrast mining in classification learning
Data & Knowledge Engineering
Evolutionary rule-based systems for imbalanced data sets
Soft Computing - A Fusion of Foundations, Methodologies and Applications - Special Issue on Evolutionary and Metaheuristics based Data Mining (EMBDM); Guest Editors: José A. Gámez, María J. del Jesús, José M. Puerta
KEEL: a software tool to assess evolutionary algorithms for data mining problems
Soft Computing - A Fusion of Foundations, Methodologies and Applications - Special Issue on Evolutionary and Metaheuristics based Data Mining (EMBDM); Guest Editors: José A. Gámez, María J. del Jesús, José M. Puerta
Dataset Shift in Machine Learning
Dataset Shift in Machine Learning
A framework for monitoring classifiers’ performance: when and why failure occurs?
Knowledge and Information Systems
International Journal of Approximate Reasoning
Handbook of Parametric and Nonparametric Statistical Procedures
Handbook of Parametric and Nonparametric Statistical Procedures
Soft Computing - A Fusion of Foundations, Methodologies and Applications
PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Improving Classification under Changes in Class and Within-Class Distributions
IWANN '09 Proceedings of the 10th International Work-Conference on Artificial Neural Networks: Part I: Bio-Inspired Systems: Computational and Ambient Intelligence
IEEE Transactions on Knowledge and Data Engineering
SMOTE: synthetic minority over-sampling technique
Journal of Artificial Intelligence Research
Learning when training data are costly: the effect of class distribution on tree induction
Journal of Artificial Intelligence Research
The foundations of cost-sensitive learning
IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
Information Sciences: an International Journal
Multi-objective genetic fuzzy classifiers for imbalanced and cost-sensitive datasets
Soft Computing - A Fusion of Foundations, Methodologies and Applications
Information Sciences: an International Journal
Discriminative Learning Under Covariate Shift
The Journal of Machine Learning Research
Assessing the impact of changing environments on classifier performance
Canadian AI'08 Proceedings of the Canadian Society for computational studies of intelligence, 21st conference on Advances in artificial intelligence
Expert Systems with Applications: An International Journal
On the suitability of combining feature selection and resampling to manage data complexity
CAEPIA'09 Proceedings of the Current topics in artificial intelligence, and 13th conference on Spanish association for artificial intelligence
Induction and pruning of classification rules for prediction of microseismic hazards in coal mines
Expert Systems with Applications: An International Journal
Soft Computing - A Fusion of Foundations, Methodologies and Applications - Special Issue on Intelligent Systems, Design and Applications (ISDA 2009)
Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning
ICIC'05 Proceedings of the 2005 international conference on Advances in Intelligent Computing - Volume Part I
AI'10 Proceedings of the 23rd Canadian conference on Advances in Artificial Intelligence
Hybridization of fuzzy GBML approaches for pattern classification problems
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Effect of rule weights in fuzzy rule-based classification systems
IEEE Transactions on Fuzzy Systems
Rule Weight Specification in Fuzzy Rule-Based Classification Systems
IEEE Transactions on Fuzzy Systems
Nearest neighbor pattern classification
IEEE Transactions on Information Theory
Information Sciences: an International Journal
Class imbalance and the curse of minority hubs
Knowledge-Based Systems
Cost-sensitive decision tree ensembles for effective imbalanced classification
Applied Soft Computing
Information Sciences: an International Journal
Hi-index | 12.05 |
Class imbalance is among the most persistent complications which may confront the traditional supervised learning task in real-world applications. The problem occurs, in the binary case, when the number of instances in one class significantly outnumbers the number of instances in the other class. This situation is a handicap when trying to identify the minority class, as the learning algorithms are not usually adapted to such characteristics. The approaches to deal with the problem of imbalanced datasets fall into two major categories: data sampling and algorithmic modification. Cost-sensitive learning solutions incorporating both the data and algorithm level approaches assume higher misclassification costs with samples in the minority class and seek to minimize high cost errors. Nevertheless, there is not a full exhaustive comparison between those models which can help us to determine the most appropriate one under different scenarios. The main objective of this work is to analyze the performance of data level proposals against algorithm level proposals focusing in cost-sensitive models and versus a hybrid procedure that combines those two approaches. We will show, by means of a statistical comparative analysis, that we cannot highlight an unique approach among the rest. This will lead to a discussion about the data intrinsic characteristics of the imbalanced classification problem which will help to follow new paths that can lead to the improvement of current models mainly focusing on class overlap and dataset shift in imbalanced classification.