SIEMÊS – a named-entity recognizer for portuguese relying on similarity rules

  • Authors:
  • Luís Sarmento

  • Affiliations:
  • Faculdade de Engenharia Universidade Porto (NIAD&R) & Linguateca (Porto Node)

  • Venue:
  • PROPOR'06 Proceedings of the 7th international conference on Computational Processing of the Portuguese Language
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper we describe SIEMÊS, a named-entity recognition system for Portuguese that relies on a set of similarity rules to base the classification procedure. These rules try to obtain soft matches between candidate entities found in text and instances contained in a wide-scope gazetteer, and avoid the need for coding large sets of rules by exploiting lexical similarities. Using this matching procedure, SIEMÊS generates a set of classification hypotheses based solely on internal evidence, which may be disambiguated in a later step by relatively simple rules based on contextual clues. We explain SIEMÊS architecture and its named-entity identification and classification procedure. We also briefly discuss the results of the participation of SIEMÊS in HAREM, the named-entity evaluation contest for Portuguese, and describe future work.