SMT helps bitext dependency parsing

Authors:
Wenliang Chen;Jun'ichi Kazama;Min Zhang;Yoshimasa Tsuruoka;Yujie Zhang;Yiou Wang;Kentaro Torisawa;Haizhou Li
Affiliations:
Institute for Infocomm Research, Singapore, and National Institute of Information and Communications Technology (NICT), Japan;Institute for Infocomm Research, Singapore, and National Institute of Information and Communications Technology (NICT), Japan;Institute for Infocomm Research, Singapore;School of Information Science, JAIST, Japan, and National Institute of Information and Communications Technology (NICT), Japan;Beijing Jiaotong University, China, and National Institute of Information and Communications Technology (NICT), Japan;National Institute of Information and Communications Technology (NICT), Japan;National Institute of Information and Communications Technology (NICT), Japan;Institute for Infocomm Research, Singapore
Venue:
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Year:
2011

Citing 12
Cited 3

A systematic comparison of various statistical alignment models

Computational Linguistics
Ultraconservative online algorithms for multiclass problems

The Journal of Machine Learning Research
Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
Statistical phrase-based translation

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Alignment by agreement

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Two languages are better than one (for syntactic parsing)

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Cross language dependency parsing using a bilingual lexicon

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
An error-driven word-character hybrid model for joint Chinese word segmentation and POS tagging

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Improving dependency parsing with subtrees from auto-parsed data

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Bilingually-constrained (monolingual) shift-reduce parsing

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3
Efficient third-order dependency parsers

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Bitext dependency parsing with bilingual subtree constraints

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

Cross-lingual parse disambiguation based on semantic correspondence

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers - Volume 2
An exploration of forest-to-string translation: does translation help or hurt parsing?

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers - Volume 2
Using parallel features in parsing of machine-translated sentences for correction of grammatical errors

SSST-6 '12 Proceedings of the Sixth Workshop on Syntax, Semantics and Structure in Statistical Translation

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose a method to improve the accuracy of parsing bilingual texts (bitexts) with the help of statistical machine translation (SMT) systems. Previous bitext parsing methods use human-annotated bilingual treebanks that are hard to obtain. Instead, our approach uses an auto-generated bilingual treebank to produce bilingual constraints. However, because the auto-generated bilingual treebank contains errors, the bilingual constraints are noisy. To overcome this problem, we use large-scale unannotated data to verify the constraints and design a set of effective bilingual features for parsing models based on the verified results. The experimental results show that our new parsers significantly outperform state-of-the-art baselines. Moreover, our approach is still able to provide improvement when we use a larger monolingual treebank that results in a much stronger baseline. Especially notable is that our approach can be used in a purely monolingual setting with the help of SMT.