SETRED: self-training with editing

  • Authors:
  • Ming Li;Zhi-Hua Zhou

  • Affiliations:
  • National Laboratory for Novel Software Technology, Nanjing University, Nanjing, China;National Laboratory for Novel Software Technology, Nanjing University, Nanjing, China

  • Venue:
  • PAKDD'05 Proceedings of the 9th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Self-training is a semi-supervised learning algorithm in which a learner keeps on labeling unlabeled examples and retraining itself on an enlarged labeled training set. Since the self-training process may erroneously label some unlabeled examples, sometimes the learned hypothesis does not perform well. In this paper, a new algorithm named Setred is proposed, which utilizes a specific data editing method to identify and remove the mislabeled examples from the self-labeled data. In detail, in each iteration of the self-training process, the local cut edge weight statistic is used to help estimate whether a newly labeled example is reliable or not, and only the reliable self-labeled examples are used to enlarge the labeled training set. Experiments show that the introduction of data editing is beneficial, and the learned hypotheses of Setred outperform those learned by the standard self-training algorithm.