Sensitivity of different machine learning algorithms to noise

Authors:
Abhinav Atla;Rahul Tada;Victor Sheng;Naveen Singireddy
Affiliations:
University of Central Arkansas, Conway, AR;University of Central Arkansas, Conway, AR;University of Central Arkansas, Conway, AR;University of Central Arkansas, Conway, AR
Venue:
Journal of Computing Sciences in Colleges
Year:
2011

Citing 6
Cited 1

Probabilistic Noise Identification and Data Cleaning

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Enhancing Data Analysis with Noise Removal

IEEE Transactions on Knowledge and Data Engineering
Bridging Local and Global Data Cleansing: Identifying Class Noise in Large, Distributed Data Datasets

Data Mining and Knowledge Discovery
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Identifying and eliminating mislabeled training instances

AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 1
Estimating continuous distributions in Bayesian classifiers

UAI'95 Proceedings of the Eleventh conference on Uncertainty in artificial intelligence

Investigation of random subspace and random forest regression models using data with injected noise

KES'12 Proceedings of the 16th international conference on Knowledge Engineering, Machine Learning and Lattice Computing with Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Noise in data is an effective cause of concern for many machine learning techniques that are used in modeling data. Researchers have studied the impact of noise only on some particular learning algorithm, but only very few attempted to analyze the effect of noise on different ones. In this work, we study the noise sensitivity of four different learning algorithms under different intensities of noise. Particularly, we compare the noise sensitivity of decision tree, naïve bayes, support vector machine, and logistic regression. The algorithms are tested on different datasets that are artificially injected with different degrees of noise. The study helps us understand the impact of different levels of noise on the learning algorithms mentioned above. Furthermore, it also guides of choosing the learning algorithms. In general, naïve bayes is the most resistant to noise. However, it performs also the worst. The other algorithms perform much better than naïve bayes especially after the noisy level is lower than 40%. When we have approaches to improve the data quality (reduce the noise level), decision tree is the most preferred one, followed by support vector machine and logistic regression, not naïve bayes.