Improving Classification by Removing or Relabeling Mislabeled Instances

  • Authors:
  • Stéphane Lallich;Fabrice Muhlenbach;Djamel A. Zighed

  • Affiliations:
  • -;-;-

  • Venue:
  • ISMIS '02 Proceedings of the 13th International Symposium on Foundations of Intelligent Systems
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

It is common that a database contains noisy data. An important source of noise consists in mislabeled training instances. We present a new approach that deals with improving classification accuracies in such a case by using a preliminary filtering procedure. An example is suspect when in its neighborhood defined by a geometrical graph the proportion of examples of the same class is not significantly greater than in the whole database. Such suspect examples in the training data can be removed or relabeled. The filtered training set is then provided as input to learning algorithm. Our experiments on ten benchmarks of UCI Machine Learning Repository using 1-NN as the final algorithm show that removing give better results than relabeling. Removing allows maintaining the generalization error rate when we introduce from 0 to 20% of noise on the class, especially when classes are well separable.