Spanish all-words semantic class disambiguation using Cast3LB corpus

  • Authors:
  • Rubén Izquierdo-Beviá;Lorenza Moreno-Monteagudo;Borja Navarro;Armando Suárez

  • Affiliations:
  • Departamento de Lenguajes y Sistemas Informáticos, Universidad de Alicante, Spain;Departamento de Lenguajes y Sistemas Informáticos, Universidad de Alicante, Spain;Departamento de Lenguajes y Sistemas Informáticos, Universidad de Alicante, Spain;Departamento de Lenguajes y Sistemas Informáticos, Universidad de Alicante, Spain

  • Venue:
  • MICAI'06 Proceedings of the 5th Mexican international conference on Artificial Intelligence
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, an approach to semantic disambiguation based on machine learning and semantic classes for Spanish is presented. A critical issue in a corpus-based approach for Word Sense Disambiguation (WSD) is the lack of wide-coverage resources to automatically learn the linguistic information. In particular, all-words sense annotated corpora such as SemCor do not have enough examples for many senses when used in a machine learning method. Using semantic classes instead of senses allows to collect a larger number of examples for each class while polysemy is reduced, improving the accuracy of semantic disambiguation. Cast3LB, a SemCor-like corpus, manually annotated with Spanish WordNet 1.5 senses, has been used in this paper to perform semantic disambiguation based on several sets of classes: lexicographer files of WordNet, WordNet Domains, and SUMO ontology.