Formal Grammar for Hispanic Named Entities Analysis

  • Authors:
  • Grettel Barceló;Eduardo Cendejas;Grigori Sidorov;Igor A. Bolshakov

  • Affiliations:
  • Centro de Investigación en Computación, Instituto Politécnico Nacional, Mexico City, Mexico;Centro de Investigación en Computación, Instituto Politécnico Nacional, Mexico City, Mexico;Centro de Investigación en Computación, Instituto Politécnico Nacional, Mexico City, Mexico;Centro de Investigación en Computación, Instituto Politécnico Nacional, Mexico City, Mexico

  • Venue:
  • CICLing '09 Proceedings of the 10th International Conference on Computational Linguistics and Intelligent Text Processing
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

A task that has been widely studied in the field of natural language processing is the Named Entity Recognition (NER). A great number of approaches have been developed to deal with the identification and classification of named entity strings in specific- and open-domains. Nevertheless, external modules have to be incorporated into many of the NER systems in order to solve the interpretation problems derived from proper nouns. In this article our focus will be on the study of ambiguity in Hispanic Nominal Sequences which constitution assumes three main problems: (1) the association of given names and/or surnames; (2) the composition of such elements by means of a connector; (3) and the duality of given name/surname. In order to analyze the magnitude of the problem, two gazetteers were made, one with 93998 given names and the other with 13779 surnames. The gazetteers entries were used as terminal symbols of the proposed grammar to determine the valid interpretations in the nominal sequences; this is done by means of an automatic labeling of all the elements the nominal sequences are made of.