Improving text classification by a sense spectrum approach to term expansion

Authors:
Peter Wittek;Sándor Darányi;Chew Lim Tan
Affiliations:
National University of Singapore, Singapore;Göteborg University & University of Boråås, Borås, Sweden;National University of Singapore, Singapore
Venue:
CoNLL '09 Proceedings of the Thirteenth Conference on Computational Natural Language Learning
Year:
2009

Citing 17
Cited 3

Generalized vector spaces model in information retrieval

SIGIR '85 Proceedings of the 8th annual international ACM SIGIR conference on Research and development in information retrieval
Latent Semantic Kernels

Journal of Intelligent Information Systems
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Unitary operators on the document space

Journal of the American Society for Information Science and Technology - Mathematical, logical, and formal methods in information retrieval
Support Vector Machines Based on a Semantic Kernel for Text Categorization

IJCNN '00 Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks (IJCNN'00)-Volume 5 - Volume 5
Kernel Methods for Pattern Analysis

Kernel Methods for Pattern Analysis
Automatic retrieval and clustering of similar words

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
The Geometry of Information Retrieval

The Geometry of Information Retrieval
Gravitation-based model for information retrieval

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Evaluating WordNet-based Measures of Lexical Semantic Relatedness

Computational Linguistics
Semantic Kernels for Text Classification Based on Topological Measures of Feature Similarity

ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Enhancing text clustering by leveraging Wikipedia semantics

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Using information content to evaluate semantic similarity in a taxonomy

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 1
Feature generation for text categorization using world knowledge

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
An ordering of terms based on semantic relatedness

IWCS-8 '09 Proceedings of the Eighth International Conference on Computational Semantics
Effective use of WordNet semantics via kernel-based learning

CONLL '05 Proceedings of the Ninth Conference on Computational Natural Language Learning
Word sense disambiguation for exploiting hierarchical thesauri in text classification

PKDD'05 Proceedings of the 9th European conference on Principles and Practice of Knowledge Discovery in Databases

Matching evolving Hilbert spaces and language for semantic access to digital libraries

ICADL'10 Proceedings of the role of digital libraries in a time of global change, and 12th international conference on Asia-Pacific digital libraries
Term ranking and categorization for ad-hoc navigation

AIMSA'10 Proceedings of the 14th international conference on Artificial intelligence: methodology, systems, and applications
Improving text classification with concept index terms and expansion terms

ISNN'11 Proceedings of the 8th international conference on Advances in neural networks - Volume Part III

Quantified Score

Hi-index	0.00

Visualization

Abstract

Experimenting with different mathematical objects for text representation is an important step of building text classification models. In order to be efficient, such objects of a formal model, like vectors, have to reasonably reproduce language-related phenomena such as word meaning inherent in index terms. We introduce an algorithm for sense-based semantic ordering of index terms which approximates Cruse's description of a sense spectrum. Following semantic ordering, text classification by support vector machines can benefit from semantic smoothing kernels that regard semantic relations among index terms while computing document similarity. Adding expansion terms to the vector representation can also improve effectiveness. This paper proposes a new kernel which discounts less important expansion terms based on lexical relatedness.