Loss-sensitive discriminative training of machine transliteration models

  • Authors:
  • Kedar Bellare;Koby Crammer;Dayne Freitag

  • Affiliations:
  • University of Massachusetts Amherst, Amherst, MA;University of Pennsylvania, Philadelphia, PA;SRI International, San Diego, CA

  • Venue:
  • SRWS '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Student Research Workshop and Doctoral Consortium
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

In machine transliteration we transcribe a name across languages while maintaining its phonetic information. In this paper, we present a novel sequence transduction algorithm for the problem of machine transliteration. Our model is discriminatively trained by the MIRA algorithm, which improves the traditional Perceptron training in three ways: (1) It allows us to consider k-best transliterations instead of the best one. (2) It is trained based on the ranking of these transliterations according to user-specified loss function (Levenshtein edit distance). (3) It enables the user to tune a built-in parameter to cope with noisy non-separable data during training. On an Arabic-English name transliteration task, our model achieves a relative error reduction of 2.2% over a perceptron-based model with similar features, and an error reduction of 7.2% over a statistical machine translation model with more complex features.