Improving semi-supervised text classification by using wikipedia knowledge

Authors:
Zhilin Zhang;Huaizhong Lin;Pengfei Li;Huazhong Wang;Dongming Lu
Affiliations:
College of Computer Science and Technology, Zhejiang University, Hangzhou, P.R. China;College of Computer Science and Technology, Zhejiang University, Hangzhou, P.R. China;College of Computer Science and Technology, Zhejiang University, Hangzhou, P.R. China;College of Computer Science and Technology, Zhejiang University, Hangzhou, P.R. China;College of Computer Science and Technology, Zhejiang University, Hangzhou, P.R. China
Venue:
WAIM'13 Proceedings of the 14th international conference on Web-Age Information Management
Year:
2013

Citing 16
Cited 0

Text Classification from Labeled and Unlabeled Documents using EM

Machine Learning - Special issue on information retrieval
Transductive Inference for Text Classification using Support Vector Machines

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
CBC: Clustering Based Text Classification Requiring Minimal Labeled Data

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Using clustering to enhance text classification

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Co-clustering based classification for out-of-domain documents

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Enhancing text clustering by leveraging Wikipedia semantics

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Improving text classification accuracy using topic modeling over an additional corpus

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Building semantic kernels for text classification using wikipedia

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Improving Text Classification by Using Encyclopedia Knowledge

ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
Text classification from unlabeled documents with bootstrapping and feature projection techniques

Information Processing and Management: an International Journal
Exploiting Wikipedia as external knowledge for document clustering

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Overcoming the brittleness bottleneck using wikipedia: enhancing text categorization with encyclopedic knowledge

AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
WikiRelate! computing semantic relatedness using wikipedia

AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
CCM: A Text Classification Model by Clustering

ASONAM '11 Proceedings of the 2011 International Conference on Advances in Social Networks Analysis and Mining
Large-scale question classification in cQA by leveraging Wikipedia semantic knowledge

Proceedings of the 20th ACM international conference on Information and knowledge management
Leveraging Wikipedia concept and category information to enhance contextual advertising

Proceedings of the 20th ACM international conference on Information and knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

Semi-supervised text classification uses both labeled and unlabeled data to construct classifiers. The key issue is how to utilize the unlabeled data. Clustering based classification method outperforms other semi-supervised text classification algorithms. However, its achievements are still limited because the vector space model representation largely ignores the semantic relationships between words. In this paper, we propose a new approach to address this problem by using Wikipedia knowledge. We enrich document representation with Wikipedia semantic features (concepts and categories), propose a new similarity measure based on the semantic relevance between Wikipedia features, and apply this similarity measure to clustering based classification. Experiment results on several corpora show that our proposed method can effectively improve semi-supervised text classification performance.