Empirical lower bounds on the complexity of translational equivalence

Authors:
Benjamin Wellington;Sonjia Waxmonsky;I. Dan Melamed
Affiliations:
New York University, New York, NY;University of Chicago, Chicago, IL;New York University, New York, NY
Venue:
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Year:
2006

Citing 19
Cited 19

Machine translation divergences: a formal description and proposed solution

Computational Linguistics
Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
Stochastic inversion transduction grammars and bilingual parsing of parallel corpora

Computational Linguistics
Extracting paraphrases from a parallel corpus

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
A syntax-based statistical translation model

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Statistical phrase-based translation

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
A comparative study on reordering constraints in statistical machine translation

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Phrasal cohesion and statistical machine translation

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
An evaluation exercise for word alignment

HLT-NAACL-PARALLEL '03 Proceedings of the HLT-NAACL 2003 Workshop on Building and using parallel texts: data driven machine translation and beyond - Volume 3
Generalized multitext grammars

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Scaling phrase-based statistical machine translation to larger corpora and longer phrases

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
A hierarchical phrase-based model for statistical machine translation

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Syntax-based alignment: supervised or unsupervised?

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Robust sub-sentential alignment of phrase-structure trees

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Alignment link projection using transformation-based learning

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Translating with non-contiguous phrases

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Some computational complexity results for synchronous context-free grammars

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Synchronous binarization for machine translation

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Word alignment for languages with scarce resources

ParaText '05 Proceedings of the ACL Workshop on Building and Using Parallel Texts

Statistical machine translation

ACM Computing Surveys (CSUR)
Extracting synchronous grammar rules from word-level alignments in linear time

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Grammar comparison study for translational equivalence modeling and statistical machine translation

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Optimal reduction of rule length in linear context-free rewriting systems

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Factorization of synchronous context-free grammars in linear time

SSST '07 Proceedings of the NAACL-HLT 2007/AMTA Workshop on Syntax and Structure in Statistical Translation
Binarization, synchronous binarization, and target-side binarization

SSST '07 Proceedings of the NAACL-HLT 2007/AMTA Workshop on Syntax and Structure in Statistical Translation
Comparing reordering constraints for SMT using efficient Bleu oracle computation

SSST '07 Proceedings of the NAACL-HLT 2007/AMTA Workshop on Syntax and Structure in Statistical Translation
Empirical lower bounds on alignment error rates in syntax-based machine translation

SSST '09 Proceedings of the Third Workshop on Syntax and Structure in Statistical Translation
A study of translation rule classification for syntax-based statistical machine translation

SSST '09 Proceedings of the Third Workshop on Syntax and Structure in Statistical Translation
A quantitative analysis of reordering phenomena

StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
A systematic analysis of translation model search spaces

StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
Binarization of synchronous context-free grammars

Computational Linguistics
A non-contiguous tree sequence alignment-based model for statistical machine translation

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
Empirical lower bounds on translation unit error rate for the full class of inversion transduction grammars

IWPT '09 Proceedings of the 11th International Conference on Parsing Technologies
Optimal parsing strategies for linear context-free rewriting systems

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Accurate non-hierarchical phrase-based translation

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Learning to translate with source and target syntax

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Grammar factorization by tree decomposition

Computational Linguistics
A $${\mathcal{O}(|G|n^6)}$$ time extension of inversion transduction grammars

Machine Translation

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes a study of the patterns of translational equivalence exhibited by a variety of bitexts. The study found that the complexity of these patterns in every bitext was higher than suggested in the literature. These findings shed new light on why "syntactic" constraints have not helped to improve statistical translation models, including finite-state phrase-based models, tree-to-string models, and tree-to-tree models. The paper also presents evidence that inversion transduction grammars cannot generate some translational equivalence relations, even in relatively simple real bitexts in syntactically similar languages with rigid word order. Instructions for replicating our experiments are at http://nip.cs.nyu.edu/GenPar/ACL06