Building a Text Classifier by a Keyword and Unlabeled Documents

Authors:
Qiang Qiu;Yang Zhang;Junping Zhu
Affiliations:
College of Information Engineering, Northwest A&F University, Yangling, P.R. China 712100;College of Information Engineering, Northwest A&F University, Yangling, P.R. China 712100;College of Information Engineering, Northwest A&F University, Yangling, P.R. China 712100
Venue:
PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Year:
2009

Citing 15
Cited 2

Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Text Classification from Labeled and Unlabeled Documents using EM

Machine Learning - Special issue on information retrieval
Partially Supervised Classification of Text Documents

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Refining Initial Points for K-Means Clustering

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
PEBL: positive example based learning for Web page classification using SVM

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining Relevant Text from Unlabelled Documents

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Building Text Classifiers Using Positive and Unlabeled Examples

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
PEBL: Web Page Classification without Negative Examples

IEEE Transactions on Knowledge and Data Engineering
Text Classification without Labeled Negative Documents

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Semantic similarity methods in wordNet and their application to information retrieval on the web

Proceedings of the 7th annual ACM international workshop on Web information and data management
Text Classification without Negative Examples Revisit

IEEE Transactions on Knowledge and Data Engineering
Learning to Classify Documents with Only a Small Positive Training Set

ECML '07 Proceedings of the 18th European conference on Machine Learning
Text classification by labeling words

AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
Learning to classify texts using positive and unlabeled data

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Learning from positive and unlabeled examples with different data distributions

ECML'05 Proceedings of the 16th European conference on Machine Learning

Building a Text Classifier by a Keyword and Wikipedia Knowledge

ADMA '09 Proceedings of the 5th International Conference on Advanced Data Mining and Applications
Editorial: Classifying text streams by keywords using classifier ensemble

Data & Knowledge Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Traditional approaches for building text classifiers usually require a lot of labeled documents, which are expensive to obtain. In this paper, we study the problem of building a text classifier from a keyword and unlabeled documents, so as to avoid labeling documents manually. Firstly, we expand the keyword into a set of query terms and retrieve a set of documents from the set of unlabeled documents. Then, from the documents retrieved, we mine a set of positive documents. Thirdly, with the help of these positive documents, more positive documents could be extracted from the unlabeled documents. And finally, we train a PU text classifier with these positive documents and unlabeled documents. Our experiment result on 20Newsgroup dataset shows that the proposed approach could help to build excellent text classifiers.