A Method for Automatic Text Categorization Using Word Sense Disambiguation

  • Authors:
  • Azucena Montes Rendon;Rocio Vargas A.;Hugo Estrada Esquivel;Juan G. Gonzalez Serna;Jose Ruiz Ascencio

  • Affiliations:
  • Centro Nacional de Investigación y Desarrollo Tecnológico, Cuernavaca, México C.P. 62490;Centro Nacional de Investigación y Desarrollo Tecnológico, Cuernavaca, México C.P. 62490;Centro Nacional de Investigación y Desarrollo Tecnológico, Cuernavaca, México C.P. 62490;Centro Nacional de Investigación y Desarrollo Tecnológico, Cuernavaca, México C.P. 62490;Centro Nacional de Investigación y Desarrollo Tecnológico, Cuernavaca, México C.P. 62490

  • Venue:
  • ICCSA '08 Proceedings of the international conference on Computational Science and Its Applications, Part II
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

At present time, Information plays a relevant role in current societies. In this context, Internet is one of the most extended mechanisms to communicate and distribute information around the word. Today, due to the extremely large number of information sources, automatic mechanisms are needed to filter the information that could be useful for each user. However, one of the problems that the usual techniques of automatic text categorization have not been able to handle is polysemy (words with two o more senses). In this paper, we have faced this problem by proposing a semantic analyzer for the automatic categorization of texts in Spanish. Context exploration techniques were used as a key mechanism for guiding the disambiguation process. A specific lexical database and its existing semantic relations fulfilled the objective of appropriately categorizing the analyzed text. To validate this analyzer, a tool was developed that classifies web pages by semantic sense. We present performance results for this classifier. Finally, a comparison with four other classification tools is reported.