SIEMÊS – a named-entity recognizer for portuguese relying on similarity rules

Authors:
Luís Sarmento
Affiliations:
Faculdade de Engenharia Universidade Porto (NIAD&R) & Linguateca (Porto Node)
Venue:
PROPOR'06 Proceedings of the 7th international conference on Computational Processing of the Portuguese Language
Year:
2006

Citing 5
Cited 5

Named Entity recognition without gazetteers

EACL '99 Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics
Message Understanding Conference-6: a brief history

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 1
Introduction to the CoNLL-2003 shared task: language-independent named entity recognition

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Improving machine translation quality with automatic named entity recognition

EAMT '03 Proceedings of the 7th International EAMT workshop on MT and other Language Technology Tools, Improving MT through other Language Technology Tools: Resources and Tools for Building MT
REPENTINO – a wide-scope gazetteer for entity recognition in portuguese

PROPOR'06 Proceedings of the 7th international conference on Computational Processing of the Portuguese Language

Answering Portuguese Questions

PROPOR '08 Proceedings of the 8th international conference on Computational Processing of the Portuguese Language
Using answer retrieval patterns to answer Portuguese questions

CLEF'08 Proceedings of the 9th Cross-language evaluation forum conference on Evaluating systems for multilingual and multimodal information access
REPENTINO – a wide-scope gazetteer for entity recognition in portuguese

PROPOR'06 Proceedings of the 7th international conference on Computational Processing of the Portuguese Language
Question answering beyond CLEF document collections

CLEF'06 Proceedings of the 7th international conference on Cross-Language Evaluation Forum: evaluation of multilingual and multi-modal information retrieval
A first step to address biography generation as an iterative QA task

CLEF'06 Proceedings of the 7th international conference on Cross-Language Evaluation Forum: evaluation of multilingual and multi-modal information retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we describe SIEMÊS, a named-entity recognition system for Portuguese that relies on a set of similarity rules to base the classification procedure. These rules try to obtain soft matches between candidate entities found in text and instances contained in a wide-scope gazetteer, and avoid the need for coding large sets of rules by exploiting lexical similarities. Using this matching procedure, SIEMÊS generates a set of classification hypotheses based solely on internal evidence, which may be disambiguated in a later step by relatively simple rules based on contextual clues. We explain SIEMÊS architecture and its named-entity identification and classification procedure. We also briefly discuss the results of the participation of SIEMÊS in HAREM, the named-entity evaluation contest for Portuguese, and describe future work.