SMT helps bitext dependency parsing

  • Authors:
  • Wenliang Chen;Jun'ichi Kazama;Min Zhang;Yoshimasa Tsuruoka;Yujie Zhang;Yiou Wang;Kentaro Torisawa;Haizhou Li

  • Affiliations:
  • Institute for Infocomm Research, Singapore, and National Institute of Information and Communications Technology (NICT), Japan;Institute for Infocomm Research, Singapore, and National Institute of Information and Communications Technology (NICT), Japan;Institute for Infocomm Research, Singapore;School of Information Science, JAIST, Japan, and National Institute of Information and Communications Technology (NICT), Japan;Beijing Jiaotong University, China, and National Institute of Information and Communications Technology (NICT), Japan;National Institute of Information and Communications Technology (NICT), Japan;National Institute of Information and Communications Technology (NICT), Japan;Institute for Infocomm Research, Singapore

  • Venue:
  • EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

We propose a method to improve the accuracy of parsing bilingual texts (bitexts) with the help of statistical machine translation (SMT) systems. Previous bitext parsing methods use human-annotated bilingual treebanks that are hard to obtain. Instead, our approach uses an auto-generated bilingual treebank to produce bilingual constraints. However, because the auto-generated bilingual treebank contains errors, the bilingual constraints are noisy. To overcome this problem, we use large-scale unannotated data to verify the constraints and design a set of effective bilingual features for parsing models based on the verified results. The experimental results show that our new parsers significantly outperform state-of-the-art baselines. Moreover, our approach is still able to provide improvement when we use a larger monolingual treebank that results in a much stronger baseline. Especially notable is that our approach can be used in a purely monolingual setting with the help of SMT.