Building a Text Classifier by a Keyword and Wikipedia Knowledge

Authors:
Qiang Qiu;Yang Zhang;Junping Zhu;Wei Qu
Affiliations:
College of Information Engineering, Northwest A&F University, Yangling 712100;College of Information Engineering, Northwest A&F University, Yangling 712100;College of Information Engineering, Northwest A&F University, Yangling 712100;College of Information Engineering, Northwest A&F University, Yangling 712100
Venue:
ADMA '09 Proceedings of the 5th International Conference on Advanced Data Mining and Applications
Year:
2009

Citing 18
Cited 0

Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Text Classification from Labeled and Unlabeled Documents using EM

Machine Learning - Special issue on information retrieval
Partially Supervised Classification of Text Documents

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Combining Labeled and Unlabeled Data for MultiClass Text Categorization

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
PEBL: positive example based learning for Web page classification using SVM

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining Relevant Text from Unlabelled Documents

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
PEBL: Web Page Classification without Negative Examples

IEEE Transactions on Knowledge and Data Engineering
Text Classification without Labeled Negative Documents

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Text Classification without Negative Examples Revisit

IEEE Transactions on Knowledge and Data Engineering
Mining Domain-Specific Thesauri from Wikipedia: A Case Study

WI '06 Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence
Learning to Classify Documents with Only a Small Positive Training Set

ECML '07 Proceedings of the 18th European conference on Machine Learning
Improving Text Classification by Using Encyclopedia Knowledge

ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
Text classification from unlabeled documents with bootstrapping and feature projection techniques

Information Processing and Management: an International Journal
Building a Text Classifier by a Keyword and Unlabeled Documents

PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Using Wikipedia knowledge to improve text classification

Knowledge and Information Systems
Text classification by labeling words

AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
Learning to classify texts using positive and unlabeled data

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Learning from positive and unlabeled examples with different data distributions

ECML'05 Proceedings of the 16th European conference on Machine Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

Traditional approach for building text classifiers usually require a lot of labeled documents, which are expensive to obtain. In this paper, we propose a new text classification approach based on a keyword and Wikipedia knowledge, so as to avoid labeling documents manually. Firstly, we retrieve a set of related documents about the keyword from Wikipedia. And then, with the help of related Wikipedia pages, more positive documents are extracted from the unlabeled documents. Finally, we train a text classifier with these positive documents and unlabeled documents. The experiment result on 20Newsgroup dataset show that the proposed approach performs very competitively compared with NB-SVM, a PU learner, and NB, a supervised learner.