Active sample selection for named entity transliteration

  • Authors:
  • Dan Goldwasser;Dan Roth

  • Affiliations:
  • University of Illinois, Urbana, IL;University of Illinois, Urbana, IL

  • Venue:
  • HLT-Short '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper introduces a new method for identifying named-entity (NE) transliterations within bilingual corpora. Current state-of-the-art approaches usually require annotated data and relevant linguistic knowledge which may not be available for all languages. We show how to effectively train an accurate transliteration classifier using very little data, obtained automatically. To perform this task, we introduce a new active sampling paradigm for guiding and adapting the sample selection process. We also investigate how to improve the classifier by identifying repeated patterns in the training data. We evaluated our approach using English, Russian and Hebrew corpora.