C4.5: programs for machine learning
C4.5: programs for machine learning
Software metrics (2nd ed.): a rigorous and practical approach
Software metrics (2nd ed.): a rigorous and practical approach
Lazy learning
Robust Classification for Imprecise Environments
Machine Learning
Pattern Recognition and Neural Networks
Pattern Recognition and Neural Networks
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Experiments with Noise Filtering in a Medical Domain
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Mining with rarity: a unifying framework
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Training Cost-Sensitive Neural Networks with Methods Addressing the Class Imbalance Problem
IEEE Transactions on Knowledge and Data Engineering
Class noise vs. attribute noise: a quantitative study of their impacts
Artificial Intelligence Review
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Detecting noisy instances with the rule-based classification model
Intelligent Data Analysis
Experimental perspectives on learning from imbalanced data
Proceedings of the 24th international conference on Machine learning
The class imbalance problem: A systematic study
Intelligent Data Analysis
Skewed Class Distributions and Mislabeled Examples
ICDMW '07 Proceedings of the Seventh IEEE International Conference on Data Mining Workshops
Learning with Limited Minority Class Data
ICMLA '07 Proceedings of the Sixth International Conference on Machine Learning and Applications
Fast learning in networks of locally-tuned processing units
Neural Computation
Automatically countering imbalance and its empirical relationship to cost
Data Mining and Knowledge Discovery
Class noise detection using frequent itemsets
Intelligent Data Analysis
SMOTE: synthetic minority over-sampling technique
Journal of Artificial Intelligence Research
Learning when training data are costly: the effect of class distribution on tree induction
Journal of Artificial Intelligence Research
Evolutionary sampling and software quality modeling of high-assurance systems
IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans
A Study on the Relationships of Classifier Performance Metrics
ICTAI '09 Proceedings of the 2009 21st IEEE International Conference on Tools with Artificial Intelligence
Improving the performance of the RBF neural networks trained with imbalanced samples
IWANN'07 Proceedings of the 9th international work conference on Artificial neural networks
Improving the classification accuracy of RBF and MLP neural networks trained with imbalanced samples
IDEAL'06 Proceedings of the 7th international conference on Intelligent Data Engineering and Automated Learning
Improving ANNs performance on unbalanced data with an AUC-Based learning algorithm
ICANN'12 Proceedings of the 22nd international conference on Artificial Neural Networks and Machine Learning - Volume Part II
The design of polynomial function-based neural network predictors for detection of software defects
Information Sciences: an International Journal
Influence of confirmation biases of developers on software quality: an empirical study
Software Quality Control
Neurocomputing
Integrated Fisher linear discriminants: An empirical study
Pattern Recognition
Hi-index | 0.00 |
Neural network algorithms such as multilayer perceptrons (MLPs) and radial basis function networks (RBFNets) have been used to construct learners which exhibit strong predictive performance. Two data related issues that can have a detrimental impact on supervised learning initiatives are class imbalance and labeling errors (or class noise). Imbalanced data can make it more difficult for the neural network learning algorithms to distinguish between examples of the various classes, and class noise can lead to the formulation of incorrect hypotheses. Both class imbalance and labeling errors are pervasive problems encountered in a wide variety of application domains. Many studies have been performed to investigate these problems in isolation, but few have focused on their combined effects. This study presents a comprehensive empirical investigation using neural network algorithms to learn from imbalanced data with labeling errors. In particular, the first component of our study investigates the impact of class noise and class imbalance on two common neural network learning algorithms, while the second component considers the ability of data sampling (which is commonly used to address the issue of class imbalance) to improve their performances. Our results, for which over two million models were trained and evaluated, show that conclusions drawn using the more commonly studied C4.5 classifier may not apply when using neural networks.