A golden resource for named entity recognition in portuguese

  • Authors:
  • Diana Santos;Nuno Cardoso

  • Affiliations:
  • Linguateca: Node of Oslo at SINTEF ICT;Linguateca: Node of XLDB at University of Lisbon

  • Venue:
  • PROPOR'06 Proceedings of the 7th international conference on Computational Processing of the Portuguese Language
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents a collection of texts manually annotated with named entities in context, which was used for HAREM, the first evaluation contest for named entity recognizers for Portuguese. We discuss the options taken and the originality of our approach compared with previous evaluation initiatives in the area. We document the choice of categories, their quantitative weight in the overall collection and how we deal with vagueness and underspecification.