A resource-based method for named entity extraction and classification

  • Authors:
  • Pablo Gamallo;Marcos Garcia

  • Affiliations:
  • Centro de Investigação em Tecnologias da Informação, Universidade de Santiago de Compostela, Galiza, Spain;Centro de Investigação em Tecnologias da Informação, Universidade de Santiago de Compostela, Galiza, Spain

  • Venue:
  • EPIA'11 Proceedings of the 15th Portugese conference on Progress in artificial intelligence
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

We propose a resource-based Named Entity Classification (NEC) system, which combines named entity extraction with simple language-independent heuristics. Large lists (gazetteers) of named entities are automatically extracted making use of semi-structured information from the Wikipedia, namely infoboxes and category trees. Language-independent heuristics are used to disambiguate and classify entities that have been already identified (or recognized) in text. We compare the performance of our resource-based system with that of a supervised NEC module implemented for the FreeLing suite, which was the winner system in CoNLL-2002 competition. Experiments were performed over Portuguese text corpora taking into account several domains and genres.