Combining data-driven systems for improving named entity recognition

  • Authors:
  • Zornitsa Kozareva;Oscar Ferrández;Andres Montoyo;Rafael Muñoz;Armando Suárez

  • Affiliations:
  • Departamento de Lenguajes y Sistemas Informáticos, Universidad de Alicante;Departamento de Lenguajes y Sistemas Informáticos, Universidad de Alicante;Departamento de Lenguajes y Sistemas Informáticos, Universidad de Alicante;Departamento de Lenguajes y Sistemas Informáticos, Universidad de Alicante;Departamento de Lenguajes y Sistemas Informáticos, Universidad de Alicante

  • Venue:
  • NLDB'05 Proceedings of the 10th international conference on Natural Language Processing and Information Systems
  • Year:
  • 2005

Quantified Score

Hi-index 0.01

Visualization

Abstract

The increasing flow of digital information requires the extraction, filtering and classification of pertinent information from large volumes of texts. An important preprocessing tool of these tasks consists of name entities recognition, which corresponds to a Name Entity Recognition (NER) task. In this paper we propose a completely automatic NER which involves identification of proper names in texts, and classification into a set of predefined categories of interest as Person names, Organizations (companies, government organizations, committees, etc.) and Locations (cities, countries, rivers, etc). We examined the differences in language models learned by different data-driven systems performing the same NLP tasks and how they can be exploited to yield a higher accuracy than the best individual system. Three NE classifiers (Hidden Markov Models, Maximum Entropy and Memory-based learner) are trained on the same corpus data and after comparison their outputs are combined using voting strategy. Results are encouraging since 98.5% accuracy for recognition and 84.94% accuracy for classification of NE for Spanish language were achieved.