Types of noise in data for concept learning
COLT '88 Proceedings of the first annual workshop on Computational learning theory
Instance-Based Learning Algorithms
Machine Learning
Tolerating noisy, irrelevant and novel attributes in instance-based learning algorithms
International Journal of Man-Machine Studies - Special issue: symbolic problem solving in noisy and novel task environments
C4.5: programs for machine learning
C4.5: programs for machine learning
Four types of noise in data for PAC learning
Information Processing Letters
The nature of statistical learning theory
The nature of statistical learning theory
Efficient noise-tolerant learning from statistical queries
Journal of the ACM (JACM)
Fast training of support vector machines using sequential minimal optimization
Advances in kernel methods
Machine Learning
Machine Learning
Class Noise vs. Attribute Noise: A Quantitative Study
Artificial Intelligence Review
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Top 10 algorithms in data mining
Knowledge and Information Systems
IJCAI'97 Proceedings of the Fifteenth international joint conference on Artifical intelligence - Volume 2
The WEKA data mining software: an update
ACM SIGKDD Explorations Newsletter
Estimating continuous distributions in Bayesian classifiers
UAI'95 Proceedings of the Eleventh conference on Uncertainty in artificial intelligence
Optimising Flash non-volatile memory using machine learning: a project overview
Proceedings of the Fifth Balkan Conference in Informatics
Investigation of random subspace and random forest regression models using data with injected noise
KES'12 Proceedings of the 16th international conference on Knowledge Engineering, Machine Learning and Lattice Computing with Applications
Information Sciences: an International Journal
IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Analysis and extension of decision trees based on imprecise probabilities: Application on noisy data
Expert Systems with Applications: An International Journal
Hi-index | 0.00 |
Machine learning techniques often have to deal with noisy data, which may affect the accuracy of the resulting data models. Therefore, effectively dealing with noise is a key aspect in supervised learning to obtain reliable models from data. Although several authors have studied the effect of noise for some particular learners, comparisons of its effect among different learners are lacking. In this paper, we address this issue by systematically comparing how different degrees of noise affect four supervised learners that belong to different paradigms. Specifically, we consider the Naïve Bayes probabilistic classifier, the C4.5 decision tree, the IBk instance-based learner and the SMO support vector machine. We have selected four methods which enable us to contrast different learning paradigms, and which are considered to be four of the top ten algorithms in data mining (Yu et al. 2007). We test them on a collection of data sets that are perturbed with noise in the input attributes and noise in the output class. As an initial hypothesis, we assign the techniques to two groups, NB with C4.5 and IBk with SMO, based on their proposed sensitivity to noise, the first group being the least sensitive. The analysis enables us to extract key observations about the effect of different types and degrees of noise on these learning techniques. In general, we find that Naïve Bayes appears as the most robust algorithm, and SMO the least, relative to the other two techniques. However, we find that the underlying empirical behavior of the techniques is more complex, and varies depending on the noise type and the specific data set being processed. In general, noise in the training data set is found to give the most difficulty to the learners.