Empirical lower bounds on alignment error rates in syntax-based machine translation

Authors:
Anders Søgaard;Jonas Kuhn
Affiliations:
University of Copenhagen;University of Potsdam
Venue:
SSST '09 Proceedings of the Third Workshop on Syntax and Structure in Statistical Translation
Year:
2009

Citing 19
Cited 5

On multiple context-free grammars

Theoretical Computer Science
The theory of parsing, translation, and compiling

The theory of parsing, translation, and compiling
Stochastic inversion transduction grammars and bilingual parsing of parallel corpora

Computational Linguistics
Synchronous tree-adjoining grammars

COLING '90 Proceedings of the 13th conference on Computational linguistics - Volume 3
A comparison of alignment models for statistical machine translation

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 2
A syntax-based statistical translation model

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
A comparative study on reordering constraints in statistical machine translation

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Learning non-isomorphic tree mappings for machine translation

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 2
Aligning words using matrix factorisation

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Empirical lower bounds on the complexity of translational equivalence

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Optimal constituent alignment with edge covers for semantic projection

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Syntax-based alignment: supervised or unsupervised?

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Some computational complexity results for synchronous context-free grammars

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Synchronous binarization for machine translation

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Purest ever example-based machine translation: Detailed presentation and assessment

Machine Translation
Hierarchical Phrase-Based Translation

Computational Linguistics
Measuring Word Alignment Quality for Statistical Machine Translation

Computational Linguistics
Probabilistic synchronous tree-adjoining grammars for machine translation: the argument from bilingual dictionaries

SSST '07 Proceedings of the NAACL-HLT 2007/AMTA Workshop on Syntax and Structure in Statistical Translation
Computing translation units and quantifying parallelism in parallel dependency treebanks

LAW '07 Proceedings of the Linguistic Annotation Workshop

Empirical lower bounds on translation unit error rate for the full class of inversion transduction grammars

IWPT '09 Proceedings of the 11th International Conference on Parsing Technologies
Accurate non-hierarchical phrase-based translation

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Multi-word unit dependency forest-based translation rule extraction

SSST-5 Proceedings of the Fifth Workshop on Syntax, Semantics and Structure in Statistical Translation
A $${\mathcal{O}(|G|n^6)}$$ time extension of inversion transduction grammars

Machine Translation
Feature-rich language-independent syntax-based alignment for statistical machine translation

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

The empirical adequacy of synchronous context-free grammars of rank two (2-SCFGs) (Satta and Peserico, 2005), used in syntax-based machine translation systems such as Wu (1997), Zhang et al. (2006) and Chiang (2007), in terms of what alignments they induce, has been discussed in Wu (1997) and Wellington et al. (2006), but with a one-sided focus on so-called "inside-out alignments". Other alignment configurations that cannot be induced by 2-SCFGs are identified in this paper, and their frequencies across a wide collection of hand-aligned parallel corpora are examined. Empirical lower bounds on two measures of alignment error rate, i.e. the one introduced in Och and Ney (2000) and one where only complete translation units are considered, are derived for 2-SCFGs and related formalisms.