Improve SMT quality with automatically extracted paraphrase rules

  • Authors:
  • Wei He;Hua Wu;Haifeng Wang;Ting Liu

  • Affiliations:
  • Research Center for Social Computing and Information Retrieval, Harbin Institute of Technology;Baidu;Baidu;Research Center for Social Computing and Information Retrieval, Harbin Institute of Technology

  • Venue:
  • ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

We propose a novel approach to improve SMT via paraphrase rules which are automatically extracted from the bilingual training data. Without using extra paraphrase resources, we acquire the rules by comparing the source side of the parallel corpus with the target-to-source translations of the target side. Besides the word and phrase paraphrases, the acquired paraphrase rules mainly cover the structured paraphrases on the sentence level. These rules are employed to enrich the SMT inputs for translation quality improvement. The experimental results show that our proposed approach achieves significant improvements of 1.6~3.6 points of BLEU in the oral domain and 0.5~1 points in the news domain.