Building a Text Classifier by a Keyword and Unlabeled Documents

  • Authors:
  • Qiang Qiu;Yang Zhang;Junping Zhu

  • Affiliations:
  • College of Information Engineering, Northwest A&F University, Yangling, P.R. China 712100;College of Information Engineering, Northwest A&F University, Yangling, P.R. China 712100;College of Information Engineering, Northwest A&F University, Yangling, P.R. China 712100

  • Venue:
  • PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Traditional approaches for building text classifiers usually require a lot of labeled documents, which are expensive to obtain. In this paper, we study the problem of building a text classifier from a keyword and unlabeled documents, so as to avoid labeling documents manually. Firstly, we expand the keyword into a set of query terms and retrieve a set of documents from the set of unlabeled documents. Then, from the documents retrieved, we mine a set of positive documents. Thirdly, with the help of these positive documents, more positive documents could be extracted from the unlabeled documents. And finally, we train a PU text classifier with these positive documents and unlabeled documents. Our experiment result on 20Newsgroup dataset shows that the proposed approach could help to build excellent text classifiers.