Multilingual named entity recognition using parallel data and metadata from Wikipedia

  • Authors:
  • Sungchul Kim;Kristina Toutanova;Hwanjo Yu

  • Affiliations:
  • POSTECH, Pohang, South Korea;Microsoft Research, Redmond, WA;POSTECH, Pohang, South Korea

  • Venue:
  • ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper we propose a method to automatically label multi-lingual data with named entity tags. We build on prior work utilizing Wikipedia metadata and show how to effectively combine the weak annotations stemming from Wikipedia metadata with information obtained through English-foreign language parallel Wikipedia sentences. The combination is achieved using a novel semi-CRF model for foreign sentence tagging in the context of a parallel English sentence. The model outperforms both standard annotation projection methods and methods based solely on Wikipedia metadata.