A Method for Automatic Text Categorization Using Word Sense Disambiguation

Authors:
Azucena Montes Rendon;Rocio Vargas A.;Hugo Estrada Esquivel;Juan G. Gonzalez Serna;Jose Ruiz Ascencio
Affiliations:
Centro Nacional de Investigación y Desarrollo Tecnológico, Cuernavaca, México C.P. 62490;Centro Nacional de Investigación y Desarrollo Tecnológico, Cuernavaca, México C.P. 62490;Centro Nacional de Investigación y Desarrollo Tecnológico, Cuernavaca, México C.P. 62490;Centro Nacional de Investigación y Desarrollo Tecnológico, Cuernavaca, México C.P. 62490;Centro Nacional de Investigación y Desarrollo Tecnológico, Cuernavaca, México C.P. 62490
Venue:
ICCSA '08 Proceedings of the international conference on Computational Science and Its Applications, Part II
Year:
2008

Citing 3
Cited 0

The Anatomy of a Hierarchical Clustering Engine for Web-page, News and Book Snippets

ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
Revising the wordnet domains hierarchy: semantics, coverage and balancing

MLR '04 Proceedings of the Workshop on Multilingual Linguistic Ressources
The role of word sense disambiguation in automated text categorization

NLDB'05 Proceedings of the 10th international conference on Natural Language Processing and Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

At present time, Information plays a relevant role in current societies. In this context, Internet is one of the most extended mechanisms to communicate and distribute information around the word. Today, due to the extremely large number of information sources, automatic mechanisms are needed to filter the information that could be useful for each user. However, one of the problems that the usual techniques of automatic text categorization have not been able to handle is polysemy (words with two o more senses). In this paper, we have faced this problem by proposing a semantic analyzer for the automatic categorization of texts in Spanish. Context exploration techniques were used as a key mechanism for guiding the disambiguation process. A specific lexical database and its existing semantic relations fulfilled the objective of appropriately categorizing the analyzed text. To validate this analyzer, a tool was developed that classifies web pages by semantic sense. We present performance results for this classifier. Finally, a comparison with four other classification tools is reported.