Improving a state-of-the-art named entity recognition system using the world wide web

  • Authors:
  • Richárd Farkas;György Szarvas;Róbert Ormándi

  • Affiliations:
  • University of Szeged, Department of Informatics, Szeged, Hungary;University of Szeged, Department of Informatics, Szeged, Hungary and Research Group on Artificial Intelligence of the Hungarian Academy of Sciences and University of Szeged, Szeged, Hungary;Research Group on Artificial Intelligence of the Hungarian Academy of Sciences and University of Szeged, Szeged, Hungary

  • Venue:
  • ICDM'07 Proceedings of the 7th industrial conference on Advances in data mining: theoretical aspects and applications
  • Year:
  • 2007

Quantified Score

Hi-index 0.01

Visualization

Abstract

The development of highly accurate Named Entity Recognition (NER) systems can be beneficial to a wide range of Human Language Technology applications. In this paper we introduce three heuristics that exploit a variety of knowledge sources (the World Wide Web, Wikipedia and WordNet) and are capable of improving further a state-of-the-art multilingual and domain independent NER system. Moreover we describe our investigations on entity recognition in simulated speech-to-text output. Our web-based heuristics attained a slight improvement over the best results published on a standard NER task, and proved to be particularly effective in the speech-to-text scenario.