A new PU learning algorithm for text classification

Authors:
Hailong Yu;Wanli Zuo;Tao Peng
Affiliations:
College of Computer Science and Technology, Jilin University, Key Laboratory of Symbol Computation and Knowledge Engineering of the Ministry of Education, Changchun, China;College of Computer Science and Technology, Jilin University, Key Laboratory of Symbol Computation and Knowledge Engineering of the Ministry of Education, Changchun, China;College of Computer Science and Technology, Jilin University, Key Laboratory of Symbol Computation and Knowledge Engineering of the Ministry of Education, Changchun, China
Venue:
MICAI'05 Proceedings of the 4th Mexican international conference on Advances in Artificial Intelligence
Year:
2005

Citing 12
Cited 1

Support-Vector Networks

Machine Learning
Context-sensitive learning methods for text categorization

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Maximizing Text-Mining Performance

IEEE Intelligent Systems
Partially Supervised Classification of Text Documents

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Positive and Unlabeled Examples Help Learning

ALT '99 Proceedings of the 10th International Conference on Algorithmic Learning Theory
Learning from Positive and Unlabeled Examples

ALT '00 Proceedings of the 11th International Conference on Algorithmic Learning Theory
PAC Learning from Positive Statistical Queries

ALT '98 Proceedings of the 9th International Conference on Algorithmic Learning Theory
PEBL: positive example based learning for Web page classification using SVM

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
One-class svms for document classification

The Journal of Machine Learning Research
Building Text Classifiers Using Positive and Unlabeled Examples

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Learning to classify texts using positive and unlabeled data

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence

On positive and unlabeled learning for text classification

TSD'11 Proceedings of the 14th international conference on Text, speech and dialogue

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper studies the problem of building text classifiers using positive and unlabeled examples. The primary challenge of this problem as compared with classical text classification problem is that no labeled negative documents are available in the training example set. We call this problem PU-Oriented text Classification. Our text classifier adopts traditional two-step approach by making use of both positive and unlabeled examples. In the first step, we improved the 1-DNF algorithm by identifying much more reliable negative documents with very low error rate. In the second step, we build a set of classifiers by iteratively applying SVM algorithm on training data set, which is augmented during iteration. Different from previous PU-oriented text classification works, we adopt the weighted vote of all classifiers generated in the iteration steps to construct the final classifier instead of choosing one of the classifiers as the final classifier. Experimental results on the Reuter data set show that our method increases the performance (F1-measure) of classifier by 1.734 percent compared with PEBL.