Supervised neural network modeling: an empirical investigation into learning from imbalanced data with labeling errors

Authors:
Taghi M. Khoshgoftaar;Jason Van Hulse;Amri Napolitano
Affiliations:
Data Mining and Machine Learning Laboratory, Department of Computer and Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, FL;Data Mining and Machine Learning Laboratory, Department of Computer and Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, FL;Data Mining and Machine Learning Laboratory, Department of Computer and Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, FL
Venue:
IEEE Transactions on Neural Networks
Year:
2010

Citing 25
Cited 7

C4.5: programs for machine learning

C4.5: programs for machine learning
Software metrics (2nd ed.): a rigorous and practical approach

Software metrics (2nd ed.): a rigorous and practical approach
Lazy learning

Lazy learning
Robust Classification for Imprecise Environments

Machine Learning
Pattern Recognition and Neural Networks

Pattern Recognition and Neural Networks
Correcting Noisy Data

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Experiments with Noise Filtering in a Medical Domain

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Mining with rarity: a unifying framework

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Training Cost-Sensitive Neural Networks with Methods Addressing the Class Imbalance Problem

IEEE Transactions on Knowledge and Data Engineering
Class noise vs. attribute noise: a quantitative study of their impacts

Artificial Intelligence Review
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Detecting noisy instances with the rule-based classification model

Intelligent Data Analysis
Experimental perspectives on learning from imbalanced data

Proceedings of the 24th international conference on Machine learning
The class imbalance problem: A systematic study

Intelligent Data Analysis
Skewed Class Distributions and Mislabeled Examples

ICDMW '07 Proceedings of the Seventh IEEE International Conference on Data Mining Workshops
Learning with Limited Minority Class Data

ICMLA '07 Proceedings of the Sixth International Conference on Machine Learning and Applications
Fast learning in networks of locally-tuned processing units

Neural Computation
Automatically countering imbalance and its empirical relationship to cost

Data Mining and Knowledge Discovery
Class noise detection using frequent itemsets

Intelligent Data Analysis
SMOTE: synthetic minority over-sampling technique

Journal of Artificial Intelligence Research
Learning when training data are costly: the effect of class distribution on tree induction

Journal of Artificial Intelligence Research
Evolutionary sampling and software quality modeling of high-assurance systems

IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans
A Study on the Relationships of Classifier Performance Metrics

ICTAI '09 Proceedings of the 2009 21st IEEE International Conference on Tools with Artificial Intelligence
Improving the performance of the RBF neural networks trained with imbalanced samples

IWANN'07 Proceedings of the 9th international work conference on Artificial neural networks
Improving the classification accuracy of RBF and MLP neural networks trained with imbalanced samples

IDEAL'06 Proceedings of the 7th international conference on Intelligent Data Engineering and Automated Learning

Extract minimum positive and maximum negative features for imbalanced binary classification

Pattern Recognition
Performance evaluation of multilayer perceptrons for discriminating and quantifying multiple kinds of odors with an electronic nose

Neural Networks
Improving ANNs performance on unbalanced data with an AUC-Based learning algorithm

ICANN'12 Proceedings of the 22nd international conference on Artificial Neural Networks and Machine Learning - Volume Part II
The design of polynomial function-based neural network predictors for detection of software defects

Information Sciences: an International Journal
Influence of confirmation biases of developers on software quality: an empirical study

Software Quality Control
The fuzzy Laplacianclassifier

Neurocomputing
Integrated Fisher linear discriminants: An empirical study

Pattern Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

Neural network algorithms such as multilayer perceptrons (MLPs) and radial basis function networks (RBFNets) have been used to construct learners which exhibit strong predictive performance. Two data related issues that can have a detrimental impact on supervised learning initiatives are class imbalance and labeling errors (or class noise). Imbalanced data can make it more difficult for the neural network learning algorithms to distinguish between examples of the various classes, and class noise can lead to the formulation of incorrect hypotheses. Both class imbalance and labeling errors are pervasive problems encountered in a wide variety of application domains. Many studies have been performed to investigate these problems in isolation, but few have focused on their combined effects. This study presents a comprehensive empirical investigation using neural network algorithms to learn from imbalanced data with labeling errors. In particular, the first component of our study investigates the impact of class noise and class imbalance on two common neural network learning algorithms, while the second component considers the ability of data sampling (which is commonly used to address the issue of class imbalance) to improve their performances. Our results, for which over two million models were trained and evaluated, show that conclusions drawn using the more commonly studied C4.5 classifier may not apply when using neural networks.