Accurate unsupervised joint named-entity extraction from unaligned parallel text

  • Authors:
  • Robert Munro;Christopher D. Manning

  • Affiliations:
  • Stanford University, Stanford, CA;Stanford University, Stanford, CA

  • Venue:
  • NEWS '12 Proceedings of the 4th Named Entity Workshop
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present a new approach to named-entity recognition that jointly learns to identify named-entities in parallel text. The system generates seed candidates through local, cross-language edit likelihood and then bootstraps to make broad predictions across both languages, optimizing combined contextual, word-shape and alignment models. It is completely unsupervised, with no manually labeled items, no external resources, only using parallel text that does not need to be easily alignable. The results are strong, with F 0.85 for purely unsupervised named-entity recognition across languages, compared to just F = 0.35 on the same data for supervised cross-domain named-entity recognition within a language. A combination of unsupervised and supervised methods increases the accuracy to F = 0.88. We conclude that we have found a viable new strategy for unsupervised named-entity recognition across low-resource languages and for domain-adaptation within high-resource languages.