Using parallel features in parsing of machine-translated sentences for correction of grammatical errors

Authors:
Rudolf Rosa;Ondřej Dušek;David Mareček;Martin Popel
Affiliations:
Charles University in Prague;Charles University in Prague;Charles University in Prague;Charles University in Prague
Venue:
SSST-6 '12 Proceedings of the Sixth Workshop on Syntax, Semantics and Structure in Statistical Translation
Year:
2012

Citing 26
Cited 0

A systematic comparison of various statistical alignment models

Computational Linguistics
Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
Stochastic inversion transduction grammars and bilingual parsing of parallel corpora

Computational Linguistics
A statistical parser for Czech

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
BLEU: a method for automatic evaluation of machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Bootstrapping parsers via syntactic projection across parallel texts

Natural Language Engineering
Non-projective dependency parsing using spanning tree algorithms

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Discriminative learning and spanning tree algorithms for dependency parsing

Discriminative learning and spanning tree algorithms for dependency parsing
Adapting a WSJ-trained parser to grammatically noisy text

HLT-Short '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers
Moses: open source toolkit for statistical machine translation

ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
The best of two worlds: cooperation of statistical and rule-based taggers for Czech

ACL '07 Proceedings of the Workshop on Balto-Slavonic Natural Language Processing: Information Extraction and Enabling Technologies
CoNLL-X shared task on multilingual dependency parsing

CoNLL-X '06 Proceedings of the Tenth Conference on Computational Natural Language Learning
Multilingual dependency analysis with a two-stage discriminative parser

CoNLL-X '06 Proceedings of the Tenth Conference on Computational Natural Language Learning
The CoNLL-2009 shared task: syntactic and semantic dependencies in multiple languages

CoNLL '09 Proceedings of the Thirteenth Conference on Computational Natural Language Learning: Shared Task
Cross language dependency parsing using a bilingual lexicon

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Bilingually-constrained (monolingual) shift-reduce parsing

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3
Joint parsing and alignment with weakly synchronized grammars

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Bitext dependency parsing with bilingual subtree constraints

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Error detection for statistical machine translation using linguistic features

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Findings of the 2010 Joint Workshop on Statistical Machine Translation and Metrics for Machine Translation

WMT '10 Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
2010 failures in English-Czech phrase-based MT

WMT '10 Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
TectoMT: modular NLP framework

IceTAL'10 Proceedings of the 7th international conference on Advances in natural language processing
Findings of the 2011 Workshop on Statistical Machine Translation

WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
Two-step translation with grammatical post-processing

WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
Multi-source transfer of delexicalized dependency parsers

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
SMT helps bitext dependency parsing

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we present two dependency parser training methods appropriate for parsing outputs of statistical machine translation (SMT), which pose problems to standard parsers due to their frequent ungrammaticality. We adapt the MST parser by exploiting additional features from the source language, and by introducing artificial grammatical errors in the parser training data, so that the training sentences resemble SMT output. We evaluate the modified parser on DEPFIX, a system that improves English-Czech SMT outputs using automatic rule-based corrections of grammatical mistakes which requires parsed SMT output sentences as its input. Both parser modifications led to improvements in BLEU score; their combination was evaluated manually, showing a statistically significant improvement of the translation quality.