Automatically generated NE tagged corpora for English and Hungarian

  • Authors:
  • Dávid Márk Nemeskey;Eszter Simon

  • Affiliations:
  • Hungarian Academy of Sciences, Budapest;Hungarian Academy of Sciences, Budapest

  • Venue:
  • NEWS '12 Proceedings of the 4th Named Entity Workshop
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Supervised Named Entity Recognizers require large amounts of annotated text. Since manual annotation is a highly costly procedure, reducing the annotation cost is essential. We present a fully automatic method to build NE annotated corpora from Wikipedia. In contrast to recent work, we apply a new method, which maps the DBpedia classes into CoNLL NE types. Since our method is mainly language-independent, we used it to generate corpora for English and Hungarian. The corpora are freely available.