A knowledge-based semantic Kernel for text classification

Authors:
Jamal Abdul Nasir;Asim Karim;George Tsatsaronis;Iraklis Varlamis
Affiliations:
School of Science and Engineering, LUMS, Pakistan;School of Science and Engineering, LUMS, Pakistan;Biotechnology Center, Technische Universität Dresden, Germany;Department of Informatics and Telematics, Harokopio University of Athens, Greece
Venue:
SPIRE'11 Proceedings of the 18th international conference on String processing and information retrieval
Year:
2011

Citing 9
Cited 1

Latent Semantic Kernels

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Support Vector Machines Based on a Semantic Kernel for Text Categorization

IJCNN '00 Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks (IJCNN'00)-Volume 5 - Volume 5
Semantic Kernels for Text Classification Based on Topological Measures of Feature Similarity

ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Statistical Comparisons of Classifiers over Multiple Data Sets

The Journal of Machine Learning Research
Word sense disambiguation: A survey

ACM Computing Surveys (CSUR)
Word sense disambiguation with spreading activation networks generated from thesauri

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Text relatedness based on a word thesaurus

Journal of Artificial Intelligence Research
A semantic kernel to exploit linguistic knowledge

AI*IA'05 Proceedings of the 9th conference on Advances in Artificial Intelligence
Word sense disambiguation for exploiting hierarchical thesauri in text classification

PKDD'05 Proceedings of the 9th European conference on Principles and Practice of Knowledge Discovery in Databases

Semantic smoothing for text clustering

Knowledge-Based Systems

Quantified Score

Hi-index	0.01

Visualization

Abstract

Typically, in textual document classification the documents are represented in the vector space using the "Bag of Words" (BOW) approach. Despite its ease of use, BOW representation cannot handle word synonymy and polysemy problems and does not consider semantic relatedness between words. In this paper, we overcome the shortages of the BOW approach by embedding a known WordNet-based semantic relatedness measure for pairs of words, namely Omiotis, into a semantic kernel. The suggested measure incorporates the TF-IDF weighting scheme, thus creating a semantic kernel which combines both semantic and statistical information from text. Empirical evaluation with real data sets demonstrates that our approach successfully achieves improved classification accuracy with respect to the standard BOW representation, when Omiotis is embedded in four different classifiers.