Instance-Based Learning Algorithms
Machine Learning
Machine Learning
Learning in the presence of concept drift and hidden contexts
Machine Learning
MetaCost: a general method for making classifiers cost-sensitive
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Learning and making decisions when costs and probabilities are both unknown
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Fuzzy Algorithms: With Applications to Image Processing and Pattern Recognition
Fuzzy Algorithms: With Applications to Image Processing and Pattern Recognition
An Instance-Weighting Method to Induce Cost-Sensitive Trees
IEEE Transactions on Knowledge and Data Engineering
Cost-Sensitive Learning by Cost-Proportionate Example Weighting
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Editorial: special issue on learning from imbalanced data sets
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Mining with rarity: a unifying framework
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
A study of the behavior of several methods for balancing machine learning training data
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Classification and Modeling with Linguistic Information Granules: Advanced Approaches to Linguistic Data Mining (Advanced Information Processing)
Using AUC and Accuracy in Evaluating Learning Algorithms
IEEE Transactions on Knowledge and Data Engineering
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Statistical Comparisons of Classifiers over Multiple Data Sets
The Journal of Machine Learning Research
The class imbalance problem: A systematic study
Intelligent Data Analysis
Maximizing classifier utility when there are data acquisition and modeling costs
Data Mining and Knowledge Discovery
Evolutionary rule-based systems for imbalanced data sets
Soft Computing - A Fusion of Foundations, Methodologies and Applications - Special Issue on Evolutionary and Metaheuristics based Data Mining (EMBDM); Guest Editors: José A. Gámez, María J. del Jesús, José M. Puerta
KEEL: a software tool to assess evolutionary algorithms for data mining problems
Soft Computing - A Fusion of Foundations, Methodologies and Applications - Special Issue on Evolutionary and Metaheuristics based Data Mining (EMBDM); Guest Editors: José A. Gámez, María J. del Jesús, José M. Puerta
Dataset Shift in Machine Learning
Dataset Shift in Machine Learning
Handbook of Parametric and Nonparametric Statistical Procedures
Handbook of Parametric and Nonparametric Statistical Procedures
Soft Computing - A Fusion of Foundations, Methodologies and Applications
IEEE Transactions on Knowledge and Data Engineering
SMOTE: synthetic minority over-sampling technique
Journal of Artificial Intelligence Research
Learning when training data are costly: the effect of class distribution on tree induction
Journal of Artificial Intelligence Research
SVMs modeling for highly imbalanced classification
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics - Special issue on human computing
A proposal on reasoning methods in fuzzy rule-based classification systems
International Journal of Approximate Reasoning
Facetwise analysis of XCS for problems with class imbalances
IEEE Transactions on Evolutionary Computation
Assessing the impact of changing environments on classifier performance
Canadian AI'08 Proceedings of the Canadian Society for computational studies of intelligence, 21st conference on Advances in artificial intelligence
Combating the Small Sample Class Imbalance Problem Using Feature Selection
IEEE Transactions on Knowledge and Data Engineering
A unifying view on dataset shift in classification
Pattern Recognition
Expert Systems with Applications: An International Journal
Support vector learning for fuzzy rule-based classification systems
IEEE Transactions on Fuzzy Systems
Rule Weight Specification in Fuzzy Rule-Based Classification Systems
IEEE Transactions on Fuzzy Systems
Nearest neighbor pattern classification
IEEE Transactions on Information Theory
Hi-index | 0.07 |
In the field of Data Mining, the estimation of the quality of the learned models is a key step in order to select the most appropriate tool for the problem to be solved. Traditionally, a k-fold validation technique has been carried out so that there is a certain degree of independency among the results for the different partitions. In this way, the highest average performance will be obtained by the most robust approach. However, applying a ''random'' division of the instances over the folds may result in a problem known as dataset shift, which consists in having a different data distribution between the training and test folds. In classification with imbalanced datasets, in which the number of instances of one class is much lower than the other class, this problem is more severe. The misclassification of minority class instances due to an incorrect learning of the real boundaries caused by a not well fitted data distribution, truly affects the measures of performance in this scenario. Regarding this fact, we propose the use of a specific validation technique for the partitioning of the data, known as ''Distribution optimally balanced stratified cross-validation'' to avoid this harmful situation in the presence of imbalance. This methodology makes the decision of placing close-by samples on different folds, so that each partition will end up with enough representatives of every region. We have selected a wide number of imbalanced datasets from KEEL dataset repository for our study, using several learning techniques from different paradigms, thus making the conclusions extracted to be independent of the underlying classifier. The analysis of the results has been carried out by means of the proper statistical study, which shows the goodness of this approach for dealing with imbalanced data.