Fast generation of translation forest for large-scale SMT discriminative training

  • Authors:
  • Xinyan Xiao;Yang Liu;Qun Liu;Shouxun Lin

  • Affiliations:
  • Institute of Computing Technology, Chinese Academy of Sciences;Institute of Computing Technology, Chinese Academy of Sciences;Institute of Computing Technology, Chinese Academy of Sciences;Institute of Computing Technology, Chinese Academy of Sciences

  • Venue:
  • EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Although discriminative training guarantees to improve statistical machine translation by incorporating a large amount of overlapping features, it is hard to scale up to large data due to decoding complexity. We propose a new algorithm to generate translation forest of training data in linear time with the help of word alignment. Our algorithm also alleviates the oracle selection problem by ensuring that a forest always contains derivations that exactly yield the reference translation. With millions of features trained on 519K sentences in 0.03 second per sentence, our system achieves significant improvement by 0.84 Bleu over the baseline system on the NIST Chinese-English test sets.