Building a Text Classifier by a Keyword and Wikipedia Knowledge

  • Authors:
  • Qiang Qiu;Yang Zhang;Junping Zhu;Wei Qu

  • Affiliations:
  • College of Information Engineering, Northwest A&F University, Yangling 712100;College of Information Engineering, Northwest A&F University, Yangling 712100;College of Information Engineering, Northwest A&F University, Yangling 712100;College of Information Engineering, Northwest A&F University, Yangling 712100

  • Venue:
  • ADMA '09 Proceedings of the 5th International Conference on Advanced Data Mining and Applications
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Traditional approach for building text classifiers usually require a lot of labeled documents, which are expensive to obtain. In this paper, we propose a new text classification approach based on a keyword and Wikipedia knowledge, so as to avoid labeling documents manually. Firstly, we retrieve a set of related documents about the keyword from Wikipedia. And then, with the help of related Wikipedia pages, more positive documents are extracted from the unlabeled documents. Finally, we train a text classifier with these positive documents and unlabeled documents. The experiment result on 20Newsgroup dataset show that the proposed approach performs very competitively compared with NB-SVM, a PU learner, and NB, a supervised learner.