Cost-Guided Class Noise Handling for Effective Cost-Sensitive Learning

  • Authors:
  • Xingquan Zhu;Xindong Wu

  • Affiliations:
  • University of Vermont, Burlington VT;University of Vermont, Burlington VT

  • Venue:
  • ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

Recent research in machine learning, data mining and related areas has produced a wide variety of algorithms for cost-sensitive (CS) classification, where instead of maximizing the classification accuracy, minimizing the misclassification cost becomes the objective. However, these methods assume that training sets do not contain significant noise, which is rarely the case in real-world environments. In this paper, we systematically study the impacts of class noise on CS learning, and propose a cost-guided class noise handling algorithm to identify noise for effective CS learning. We call it Cost-guided Iterative Classification Filter (CICF), because it seamlessly integrates costs and an existing Classification Filter for noise identification. Instead of putting equal weights to handle noise in all classes in existing efforts, CICF puts more emphasis on expensive classes, which makes it especially successful in dealing with datasets with a large cost-ratio. Experimental results and comparative studies from real-world datasets indicate that the existence of noise may seriously corrupt the performance of CS classifiers, and by adopting the proposed CICF algorithm, we can significantly reduce the misclassification cost of a CS classifier in noisy environments.