Learning from good and bad data
Learning from good and bad data
Instance-Based Learning Algorithms
Machine Learning
On estimating probabilities in tree pruning
EWSL-91 Proceedings of the European working session on learning on Machine learning
C4.5: programs for machine learning
C4.5: programs for machine learning
Data quality and systems theory
Communications of the ACM
The impact of poor data quality on the typical enterprise
Communications of the ACM
Robust Classification for Imprecise Environments
Machine Learning
Machine Learning
Data Quality for the Information Age
Data Quality for the Information Age
Theoretical Computer Science
Machine Learning
Tree induction vs. logistic regression: a learning-curve analysis
The Journal of Machine Learning Research
Uniform-distribution attribute noise learnability
Information and Computation
Class noise vs. attribute noise: a quantitative study of their impacts
Artificial Intelligence Review
Improvements to Platt's SMO Algorithm for SVM Classifier Design
Neural Computation
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Accessing information sharing and information quality in supply chain management
Decision Support Systems
A logical framework for identifying quality knowledge from different data sources
Decision Support Systems
Data quality for telecommunications
IEEE Journal on Selected Areas in Communications
A hybrid approach for efficient ensembles
Decision Support Systems
A dynamic classifier ensemble selection approach for noise data
Information Sciences: an International Journal
Robust ensemble learning for mining noisy data streams
Decision Support Systems
A robust missing value imputation method for noisy data
Applied Intelligence
Preprocessing unbalanced data using support vector machine
Decision Support Systems
Journal of Data and Information Quality (JDIQ)
An experimental comparison of real and artificial deception using a deception generation model
Decision Support Systems
Hi-index | 0.00 |
We present an empirical comparison of classification algorithms when training data contains attribute noise levels not representative of field data. To study algorithm sensitivity, we develop an innovative experimental design using noise situation, algorithm, noise level, and training set size as factors. Our results contradict conventional wisdom indicating that investments to achieve representative noise levels may not be worthwhile. In general, over representative training noise should be avoided while under representative training noise is less of a concern. However, interactions among algorithm, noise level, and training set size indicate that these general results may not apply to particular practice situations.