Combining data-driven systems for improving Named Entity Recognition

  • Authors:
  • Z. Kozareva;O. Ferrández;A. Montoyo;R. Muñoz;A. Suárez;J. Gómez

  • Affiliations:
  • Natural Language Processing and Information Systems Group, Department of Software and Computing Systems, University of Alicante, Alicante, Spain;Natural Language Processing and Information Systems Group, Department of Software and Computing Systems, University of Alicante, Alicante, Spain;Natural Language Processing and Information Systems Group, Department of Software and Computing Systems, University of Alicante, Alicante, Spain;Natural Language Processing and Information Systems Group, Department of Software and Computing Systems, University of Alicante, Alicante, Spain;Natural Language Processing and Information Systems Group, Department of Software and Computing Systems, University of Alicante, Alicante, Spain;Natural Language Processing and Information Systems Group, Department of Software and Computing Systems, University of Alicante, Alicante, Spain

  • Venue:
  • Data & Knowledge Engineering
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

The increasing flow of digital information requires the extraction, filtering and classification of pertinent information from large volumes of texts. All these tasks greatly benefit from involving a Named Entity Recognizer (NER) in the preprocessing stage. This paper proposes a completely automatic NER system. The NER task involves not only the identification of proper names (Named Entities) in natural language text, but also their classification into a set of predefined categories, such as names of persons, organizations (companies, government organizations, committees, etc.), locations (cities, countries, rivers, etc.) and miscellaneous (movie titles, sport events, etc.). Throughout the paper, we examine the differences between language models learned by different data-driven classifiers confronted with the same NLP task, as well as ways to exploit these differences to yield a higher accuracy than the best individual classifier. Three machine learning classifiers (Hidden Markov Model, Maximum Entropy and Memory Based Learning) are trained on the same corpus in order to resolve the NE task. After comparison, their output is combined using voting strategies. A comprehensive study and experimental work on the evaluation of our system, as well as a comparison with other systems has been carried out within the framework of two specialized scientific competitions for NER, CoNLL-2002 and HAREM-2005. Finally, this paper describes the integration of our NER system in different NLP applications, in concrete Geographic Information Retrieval and Conceptual Modelling.