Using Wikipedia knowledge to improve text classification

Authors:
Pu Wang;Jian Hu;Hua-Jun Zeng;Zheng Chen
Affiliations:
George Mason University, Department of Computer Science, 22030, Fairfax, VA, USA;Microsoft Research Asia, Machine Learning Group, Beijing, China;Microsoft Research Asia, Machine Learning Group, Beijing, China;Microsoft Research Asia, Machine Learning Group, Beijing, China
Venue:
Knowledge and Information Systems
Year:
2009

Citing 0
Cited 12

Building a Text Classifier by a Keyword and Wikipedia Knowledge

ADMA '09 Proceedings of the 5th International Conference on Advanced Data Mining and Applications
Unsupervised Feature Generation using Knowledge Repositories for Effective Text Categorization

Proceedings of the 2010 conference on ECAI 2010: 19th European Conference on Artificial Intelligence
Using thesaurus to improve multiclass text classification

CICLing'11 Proceedings of the 12th international conference on Computational linguistics and intelligent text processing - Volume Part II
Leveraging Wikipedia concept and category information to enhance contextual advertising

Proceedings of the 20th ACM international conference on Information and knowledge management
Two birds with one stone: learning semantic models for text categorization and word sense disambiguation

Proceedings of the 20th ACM international conference on Information and knowledge management
Advertising Keywords Recommendation for Short-Text Web Pages Using Wikipedia

ACM Transactions on Intelligent Systems and Technology (TIST)
Using Wikipedia concepts and frequency in language to extract key terms from support documents

Expert Systems with Applications: An International Journal
Classifying image galleries into a taxonomy using metadata and wikipedia

NLDB'12 Proceedings of the 17th international conference on Applications of Natural Language Processing and Information Systems
Collaboratively built semi-structured content and Artificial Intelligence: The story so far

Artificial Intelligence
Hybrid models based on rough set classifiers for setting credit rating decision rules in the global banking industry

Knowledge-Based Systems
Automated crime report analysis and classification for e-government and decision support

Proceedings of the 14th Annual International Conference on Digital Government Research
An incremental construction method of a large-scale thesaurus using co-occurrence information

International Journal of Computer Applications in Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

Text classification has been widely used to assist users with the discovery of useful information from the Internet. However, traditional classification methods are based on the “Bag of Words” (BOW) representation, which only accounts for term frequency in the documents, and ignores important semantic relationships between key terms. To overcome this problem, previous work attempted to enrich text representation by means of manual intervention or automatic document expansion. The achieved improvement is unfortunately very limited, due to the poor coverage capability of the dictionary, and to the ineffectiveness of term expansion. In this paper, we automatically construct a thesaurus of concepts from Wikipedia. We then introduce a unified framework to expand the BOW representation with semantic relations (synonymy, hyponymy, and associative relations), and demonstrate its efficacy in enhancing previous approaches for text classification. Experimental results on several data sets show that the proposed approach, integrated with the thesaurus built from Wikipedia, can achieve significant improvements with respect to the baseline algorithm.