Training pool selection for semi-supervised learning

  • Authors:
  • Jian Ge;Tinghuai Ma;Qiaoqiao Yan;Yonggang Yan;Wei Tian

  • Affiliations:
  • Jiangsu Engineering Center of Network Monitoring, Nanjing University of Information Science & Technology, Nanjing, China,College of Computer & Software, Nanjing University of Information S ...;Jiangsu Engineering Center of Network Monitoring, Nanjing University of Information Science & Technology, Nanjing, China,College of Computer & Software, Nanjing University of Information S ...;Jiangsu Engineering Center of Network Monitoring, Nanjing University of Information Science & Technology, Nanjing, China,College of Computer & Software, Nanjing University of Information S ...;Jiangsu Engineering Center of Network Monitoring, Nanjing University of Information Science & Technology, Nanjing, China,College of Computer & Software, Nanjing University of Information S ...;Jiangsu Engineering Center of Network Monitoring, Nanjing University of Information Science & Technology, Nanjing, China,College of Computer & Software, Nanjing University of Information S ...

  • Venue:
  • ISNN'12 Proceedings of the 9th international conference on Advances in Neural Networks - Volume Part I
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Semi-supervised leaning deals with methods for automatically exploiting unlabeled samples in addition to labeled set. The data selection is an important topic in active learning. It addresses the selection the valuable unlabeled data to label, considering that labeling data is a costly job. In this paper, we want to discuss in detail three aspects of technology in data selection, which includes how to select the unlabeled sample, how many unlabeled samples should be selected and how to define the capacity of the training pool. Experiments which use self-training based on C4.5 show that while the L labeled ratio lager continuous, the initial error value becomes smaller. Also when L labeled ratio is less than 10%, the selection ratio value should be set in less than 0.8.The error value has no significant change while selection ratio value larger than 1.0.