Named entity recognition in Wikipedia

  • Authors:
  • Dominic Balasuriya;Nicky Ringland;Joel Nothman;Tara Murphy;James R. Curran

  • Affiliations:
  • University of Sydney, NSW, Australia;University of Sydney, NSW, Australia;University of Sydney, NSW, Australia;University of Sydney, NSW, Australia;University of Sydney, NSW, Australia

  • Venue:
  • People's Web '09 Proceedings of the 2009 Workshop on The People's Web Meets NLP: Collaboratively Constructed Semantic Resources
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Named entity recognition (NER) is used in many domains beyond the newswire text that comprises current gold-standard corpora. Recent work has used Wikipedia's link structure to automatically generate near gold-standard annotations. Until now, these resources have only been evaluated on newswire corpora or themselves. We present the first NER evaluation on a Wikipedia gold standard (WG) corpus. Our analysis of cross-corpus performance on WG shows that Wikipedia text may be a harder NER domain than newswire. We find that an automatic annotation of Wikipedia has high agreement with WG and, when used as training data, outperforms newswire models by up to 7.7%.