New Labeling Strategy for Semi-supervised Document Categorization

  • Authors:
  • Yan Zhu;Liping Jing;Jian Yu

  • Affiliations:
  • School of Computer and Information Technology, Beijing Jiaotong University, Beijing, China;School of Computer and Information Technology, Beijing Jiaotong University, Beijing, China;School of Computer and Information Technology, Beijing Jiaotong University, Beijing, China

  • Venue:
  • KSEM '09 Proceedings of the 3rd International Conference on Knowledge Science, Engineering and Management
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Usually, semi-supervised learning requires a number of prior knowledge to supervise the learning process, such as, seeds in Seeded-Kmeans, pair-wise constraints in COP-Kmeans, and labeled data for training an initial useful classifier in S3VM. Such prior knowledge is generally provided by the domain expert, so it is very expensive. In this paper, we propose a new automatical document labeling strategy to derive much more prior knowledge based on the very limited labeled data and the whole data set. Experimental results on 20-Newsgroup text data have shown that the new strategy is helpful for semi-supervised document categorization and improves the learning performance.