Text classification with relatively small positive documents and unlabeled data

  • Authors:
  • Fumiyo Fukumoto;Takeshi Yamamoto;Suguru Matsuyoshi;Yoshimi Suzuki

  • Affiliations:
  • Univ. of Yamanashi, Kofu, Japan;Univ. of Yamanashi, Kofu, Japan;Univ. of Yamanashi, Kofu, Japan;Interdisciplinary Graduate School of Medicine and Engineering, Kofu, Japan

  • Venue:
  • Proceedings of the 21st ACM international conference on Information and knowledge management
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper addresses the problem of dealing with a collection of negative training documents which is suitable for relatively small number of positive documents, and presents a method for eliminating the need for manually collecting negative training documents based on supervised machine learning techniques. We applied an error correction technique to the results of negative training data obtained by the Positive Example Based Learning (PEBL). Moreover, we used a boosting technique to learn a set of negative data to train classifiers. The results using Japanese newspaper documents showed that the method contributes for reducing the cost of manual collection of negative training documents.