Classification algorithm sensitivity to training data with non representative attribute noise

  • Authors:
  • Michael Mannino;Yanjuan Yang;Young Ryu

  • Affiliations:
  • The Business School, University of Colorado Denver, Denver, CO 80217, USA;The Business School, University of Colorado Denver, Denver, CO 80217, USA;School of Management, University of Texas at Dallas, Richardson, Texas 75083-0688, USA

  • Venue:
  • Decision Support Systems
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present an empirical comparison of classification algorithms when training data contains attribute noise levels not representative of field data. To study algorithm sensitivity, we develop an innovative experimental design using noise situation, algorithm, noise level, and training set size as factors. Our results contradict conventional wisdom indicating that investments to achieve representative noise levels may not be worthwhile. In general, over representative training noise should be avoided while under representative training noise is less of a concern. However, interactions among algorithm, noise level, and training set size indicate that these general results may not apply to particular practice situations.