Building a dictionary of anthroponyms

  • Authors:
  • Jorge Baptista;Fernando Batista;Nuno Mamede

  • Affiliations:
  • L2F – Laboratório de Sistemas de Língua Falada – INESC ID Lisboa, Lisboa, Portugal;L2F – Laboratório de Sistemas de Língua Falada – INESC ID Lisboa, Lisboa, Portugal;L2F – Laboratório de Sistemas de Língua Falada – INESC ID Lisboa, Lisboa, Portugal

  • Venue:
  • PROPOR'06 Proceedings of the 7th international conference on Computational Processing of the Portuguese Language
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents a methodology for building an electronic dictionary of anthroponyms of European Portuguese (DicPRO), which constitutes a useful resource for computational processing, due to the importance of names in the structuring of information in texts. The dictionary has been enriched with morphosyntactic and semantic information. It was then used in the specific task of capitalizing anthroponyms and other proper names on a corpus automatically produced by a broadcast news speech recognition system and manually corrected. The output of this system does not offer clues, such as capitalized words or punctuation. This task expects to contribute in rendering more readable the output of such system. The paper shows that, by combining lexical, contextual (positional) and statistical information, instead of only one of these strategies, better results can be achieved in this task.