Extract semantic information from Wordnet to improve text classification performance

Authors:
Rujiang Bai;Xiaoyue Wang;Junhua Liao
Affiliations:
Shandong University of Technology, Zibo, China;Shandong University of Technology, Zibo, China;Shandong University of Technology, Zibo, China
Venue:
AST/UCMA/ISA/ACN'10 Proceedings of the 2010 international conference on Advances in computer science and information technology
Year:
2010

Citing 9
Cited 0

OHSUMED: an interactive retrieval evaluation and new large test collection for research

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
WordNet: a lexical database for English

Communications of the ACM
A re-examination of text categorization methods

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Centroid-Based Document Classification: Analysis and Experimental Results

PKDD '00 Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery
Mining the peanut gallery: opinion extraction and semantic classification of product reviews

WWW '03 Proceedings of the 12th international conference on World Wide Web
Overcoming the brittleness bottleneck using wikipedia: enhancing text categorization with encyclopedic knowledge

AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Feature generation for text categorization using world knowledge

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Since a decade, text categorization has become an active field of research in the machine learning community. Most of the approaches are based on the term occurrence frequency. The performance of such surface-based methods can decrease when the texts are too complex, i.e., ambiguous. One alternative is to use the semantic-based approaches to process textual documents according to their meaning. In this paper, we propose a Concept-based Vector Space Model which reflects the more abstract version of the semantic information instead of the Vector Space Model for the text. This model adjusts the weight of the Vector Space by importing the hypernymy-hyponymy relation between synonymy sets and the Concept Chain in the WordNet. Experimental results on several data sets show that the proposed approach, conception built from Wordnet, can achieve significant improvements with respect to the baseline algorithm.