Prediction of learning curves in machine translation

  • Authors:
  • Prasanth Kolachina;Nicola Cancedda;Marc Dymetman;Sriram Venkatapathy

  • Affiliations:
  • LTRC, IIIT-Hyderabad, Hyderabad, India;Xerox Research Centre Europe, Meylan, France;Xerox Research Centre Europe, Meylan, France;Xerox Research Centre Europe, Meylan, France

  • Venue:
  • ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Parallel data in the domain of interest is the key resource when training a statistical machine translation (SMT) system for a specific purpose. Since ad-hoc manual translation can represent a significant investment in time and money, a prior assesment of the amount of training data required to achieve a satisfactory accuracy level can be very useful. In this work, we show how to predict what the learning curve would look like if we were to manually translate increasing amounts of data. We consider two scenarios, 1) Monolingual samples in the source and target languages are available and 2) An additional small amount of parallel corpus is also available. We propose methods for predicting learning curves in both these scenarios.