Extracting initial and reliable negative documents to enhance classification performance

  • Authors:
  • Hui Wang;Wanli Zuo

  • Affiliations:
  • College of Computer Science and Technology, Jilin University, Key Laboratory of Symbolic Computation and Knowledge Engineering of the Ministry of Education, Changchun, China;College of Computer Science and Technology, Jilin University, Key Laboratory of Symbolic Computation and Knowledge Engineering of the Ministry of Education, Changchun, China

  • Venue:
  • KDLL'06 Proceedings of the 2006 international conference on Knowledge Discovery in Life Science Literature
  • Year:
  • 2006

Quantified Score

Hi-index 0.02

Visualization

Abstract

Most existing text classification work assumes that training data are completely labeled. In real life, some information retrieval problems can only be described as learning a binary classifier from a set of incompletely labeled examples, where a small set of labeled positive examples and a very large set of unlabeled ones are provided. In this case, all of the traditional text classification methods can't work properly. In this paper, we propose a method called Weighted Voting Classifier, which is an improved 1-DNF algorithm. Experimental results on the Reuters-21578 set show that our algorithm Weighting Voting Classifier outperforms PEBL and one-class SVM in terms of F measure. Weighting Voting Classifier can achieve high F score when comparing with PEBL and one-class SVM. Furthermore, the reduction of iterations is 2.26 when comparing the method of PEBL with ours.