Text classification with relatively small positive documents and unlabeled data

Authors:
Fumiyo Fukumoto;Takeshi Yamamoto;Suguru Matsuyoshi;Yoshimi Suzuki
Affiliations:
Univ. of Yamanashi, Kofu, Japan;Univ. of Yamanashi, Kofu, Japan;Univ. of Yamanashi, Kofu, Japan;Interdisciplinary Graduate School of Medicine and Engineering, Kofu, Japan
Venue:
Proceedings of the 21st ACM international conference on Information and knowledge management
Year:
2012

Citing 4
Cited 0

BoosTexter: A Boosting-based Systemfor Text Categorization

Machine Learning - Special issue on information retrieval
Toward Optimal Active Learning through Sampling Estimation of Error Reduction

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
PEBL: positive example based learning for Web page classification using SVM

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Multilabel classification with meta-level features

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper addresses the problem of dealing with a collection of negative training documents which is suitable for relatively small number of positive documents, and presents a method for eliminating the need for manually collecting negative training documents based on supervised machine learning techniques. We applied an error correction technique to the results of negative training data obtained by the Positive Example Based Learning (PEBL). Moreover, we used a boosting technique to learn a set of negative data to train classifiers. The results using Japanese newspaper documents showed that the method contributes for reducing the cost of manual collection of negative training documents.