A resource-based method for named entity extraction and classification

Authors:
Pablo Gamallo;Marcos Garcia
Affiliations:
Centro de Investigação em Tecnologias da Informação, Universidade de Santiago de Compostela, Galiza, Spain;Centro de Investigação em Tecnologias da Informação, Universidade de Santiago de Compostela, Galiza, Spain
Venue:
EPIA'11 Proceedings of the 15th Portugese conference on Progress in artificial intelligence
Year:
2011

Citing 9
Cited 0

Wide-Coverage Spanish Named Entity Extraction

IBERAMIA 2002 Proceedings of the 8th Ibero-American Conference on AI: Advances in Artificial Intelligence
Message Understanding Conference-6: a brief history

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 1
Japanese Named Entity extraction with redundant morphological analysis

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Named Entity Extraction using AdaBoost

COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
Introduction to the CoNLL-2002 shared task: language-independent named entity recognition

COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Analysing Wikipedia and gold-standard corpora for NER training

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Functional aspects in portuguese NER

PROPOR'06 Proceedings of the 7th international conference on Computational Processing of the Portuguese Language
Unsupervised named-entity recognition: generating gazetteers and resolving ambiguity

AI'06 Proceedings of the 19th international conference on Advances in Artificial Intelligence: Canadian Society for Computational Studies of Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose a resource-based Named Entity Classification (NEC) system, which combines named entity extraction with simple language-independent heuristics. Large lists (gazetteers) of named entities are automatically extracted making use of semi-structured information from the Wikipedia, namely infoboxes and category trees. Language-independent heuristics are used to disambiguate and classify entities that have been already identified (or recognized) in text. We compare the performance of our resource-based system with that of a supervised NEC module implemented for the FreeLing suite, which was the winner system in CoNLL-2002 competition. Experiments were performed over Portuguese text corpora taking into account several domains and genres.