Combining active learning and semi-supervised learning to construct SVM classifier

  • Authors:
  • Yan Leng;Xinyan Xu;Guanghui Qi

  • Affiliations:
  • College of Physics and Electronics, Shandong Normal University, Ji'nan 250014, China;Department of Computer Science and Technology, Shandong College of Electronic Technology, Ji'nan 250200, China;Department of Mechanical Engineering, Shandong Jiaotong University, Ji'nan 250023, China

  • Venue:
  • Knowledge-Based Systems
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

One key issue for most classification algorithms is that they need large amounts of labeled samples to train the classifier. Since manual labeling is time consuming, researchers have proposed technologies of active learning and semi-supervised learning to reduce manual labeling workload. There is a certain degree of complementarity between active learning and semi-supervised learning, and therefore some researches combine them to further reduce manual labeling workload. However, researches on combining active learning and semi-supervised learning for SVM classifier are rare. Of numerous SVM active learning algorithms, the most popular is the one that queries the sample closest to the current classification hyperplane in each iteration, which is denoted as SVM"A"L in this paper. Realizing that SVM"A"L is only interested in samples that are more likely to be on the class boundary, while ignoring the usage of the rest large amounts of unlabeled samples, this paper designs a semi-supervised learning algorithm to make full use of the rest non-queried samples, and further forms a new active semi-supervised SVM algorithm. The proposed active semi-supervised SVM algorithm uses active learning to select class boundary samples, and semi-supervised learning to select class central samples, for class central samples are believed to better describe the class distribution, and to help SVM"A"L finding the boundary samples more precisely. In order not to introduce too many labeling errors when exploring class central samples, the label changing rate is used to ensure the reliability of the predicted labels. Experimental results show that the proposed active semi-supervised SVM algorithm performs much better than the pure SVM active learning algorithm, and thus can further reduce manual labeling workload.