REPENTINO – a wide-scope gazetteer for entity recognition in portuguese

  • Authors:
  • Luís Sarmento;Ana Sofia Pinto;Luís Cabral

  • Affiliations:
  • Faculdade de Engenharia da Universidade do Porto (NIAD&R), Porto, Portugal;Linguateca – Pólo do Porto, Porto;Linguateca – Pólo de Oslo, Blindern, Oslo, Norway

  • Venue:
  • PROPOR'06 Proceedings of the 7th international conference on Computational Processing of the Portuguese Language
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper we describe REPENTINO, a publicly available gazetteer intended to help the development of named entity recognition systems for Portuguese. REPENTINO wishes to minimize the problems developers face due to the limited availability of this type of lexical-semantic resources for Portuguese. The data stored in REPENTINO was mostly extracted from corpora and from the web using simple semi-automated methods. Currently, REPENTINO stores nearly 450k instances of named entities divided in more than 100 categories and subcategories covering a much wider set of domains than those usually included in traditional gazetteers. We will present some figures regarding the current content of the gazetteer and describe future work regarding the evaluation of this resource and its enrichment with additional information.