Identifying mislabeled training data with the aid of unlabeled data

  • Authors:
  • Donghai Guan;Weiwei Yuan;Young-Koo Lee;Sungyoung Lee

  • Affiliations:
  • College of Automation, Harbin Engineering University, Harbin, China 150001;Dept. of Computer Engineering, Kyung Hee University, Yongin-si, Korea 446-701;Dept. of Computer Engineering, Kyung Hee University, Yongin-si, Korea 446-701;Dept. of Computer Engineering, Kyung Hee University, Yongin-si, Korea 446-701

  • Venue:
  • Applied Intelligence
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents a new approach for identifying and eliminating mislabeled training instances for supervised learning algorithms. The novelty of this approach lies in the using of unlabeled instances to aid the detection of mislabeled training instances. This is in contrast with existing methods which rely upon only the labeled training instances. Our approach is straightforward and can be applied to many existing noise detection methods with only marginal modifications on them as required. To assess the benefit of our approach, we choose two popular noise detection methods: majority filtering (MF) and consensus filtering (CF). MFAUD/CFAUD is the new proposed variant of MF/CF which relies on our approach and denotes majority/consensus filtering with the aid of unlabeled data. Empirical study validates the superiority of our approach and shows that MFAUD and CFAUD can significantly improve the performances of MF and CF under different noise ratios and labeled ratios. In addition, the improvement is more remarkable when the noise ratio is greater.