Relaxed cross-lingual projection of constituent syntax

  • Authors:
  • Wenbin Jiang;Qun Liu;Yajuan Lü

  • Affiliations:
  • Institute of Computing Technology, Chinese Academy of Sciences;Institute of Computing Technology, Chinese Academy of Sciences;Institute of Computing Technology, Chinese Academy of Sciences

  • Venue:
  • EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

We propose a relaxed correspondence assumption for cross-lingual projection of constituent syntax, which allows a supposed constituent of the target sentence to correspond to an unrestricted treelet in the source parse. Such a relaxed assumption fundamentally tolerates the syntactic non-isomorphism between languages, and enables us to learn the target-language-specific syntactic idiosyncrasy rather than a strained grammar directly projected from the source language syntax. Based on this assumption, a novel constituency projection method is also proposed in order to induce a projected constituent tree-bank from the source-parsed bilingual corpus. Experiments show that, the parser trained on the projected treebank dramatically outperforms previous projected and unsupervised parsers.