Classification algorithm sensitivity to training data with non representative attribute noise

Authors:
Michael Mannino;Yanjuan Yang;Young Ryu
Affiliations:
The Business School, University of Colorado Denver, Denver, CO 80217, USA;The Business School, University of Colorado Denver, Denver, CO 80217, USA;School of Management, University of Texas at Dallas, Richardson, Texas 75083-0688, USA
Venue:
Decision Support Systems
Year:
2009

Citing 23
Cited 7

Learning from good and bad data

Learning from good and bad data
Instance-Based Learning Algorithms

Machine Learning
On estimating probabilities in tree pruning

EWSL-91 Proceedings of the European working session on learning on Machine learning
C4.5: programs for machine learning

C4.5: programs for machine learning
Data quality and systems theory

Communications of the ACM
The impact of poor data quality on the typical enterprise

Communications of the ACM
An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization

Machine Learning
Robust Classification for Imprecise Environments

Machine Learning
Soft Margins for AdaBoost

Machine Learning
Data Quality for the Information Age

Data Quality for the Information Age
PAC learning with nasty noise

Theoretical Computer Science
An Empirical Comparison of Pruning Methods for Decision Tree Induction

Machine Learning
Induction of Decision Trees

Machine Learning
Tree induction vs. logistic regression: a learning-curve analysis

The Journal of Machine Learning Research
Uniform-distribution attribute noise learnability

Information and Computation
On the Application of ROC Analysis to Predict Classification Performance Under Varying Class Distributions

Machine Learning
A Response to Webb and Ting's On the Application of ROC Analysis to Predict Classification Performance Under Varying Class Distributions

Machine Learning
Class noise vs. attribute noise: a quantitative study of their impacts

Artificial Intelligence Review
Improvements to Platt's SMO Algorithm for SVM Classifier Design

Neural Computation
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Accessing information sharing and information quality in supply chain management

Decision Support Systems
A logical framework for identifying quality knowledge from different data sources

Decision Support Systems
Data quality for telecommunications

IEEE Journal on Selected Areas in Communications

A hybrid approach for efficient ensembles

Decision Support Systems
A dynamic classifier ensemble selection approach for noise data

Information Sciences: an International Journal
Robust ensemble learning for mining noisy data streams

Decision Support Systems
A robust missing value imputation method for noisy data

Applied Intelligence
Preprocessing unbalanced data using support vector machine

Decision Support Systems
An Experimental Comparison of a Document Deception Detection Policy using Real and Artificial Deception

Journal of Data and Information Quality (JDIQ)
An experimental comparison of real and artificial deception using a deception generation model

Decision Support Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present an empirical comparison of classification algorithms when training data contains attribute noise levels not representative of field data. To study algorithm sensitivity, we develop an innovative experimental design using noise situation, algorithm, noise level, and training set size as factors. Our results contradict conventional wisdom indicating that investments to achieve representative noise levels may not be worthwhile. In general, over representative training noise should be avoided while under representative training noise is less of a concern. However, interactions among algorithm, noise level, and training set size indicate that these general results may not apply to particular practice situations.