Data-Oriented Translation

Authors:
Arjen Poutsma
Affiliations:
University of Amsterdam, the Netherlands
Venue:
COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 2
Year:
2000

Citing 5
Cited 12

A framework of a mechanical translation between Japanese and English by analogy principle

Proc. of the international NATO symposium on Artificial and human intelligence
A statistical approach to machine translation

Computational Linguistics
Using lexicalized tags for machine translation

COLING '90 Proceedings of the 13th conference on Computational linguistics - Volume 3
Synchronous tree-adjoining grammars

COLING '90 Proceedings of the 13th conference on Computational linguistics - Volume 3
Computational complexity of probabilistic disambiguation by means of tree-grammars

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2

Review Article: Example-based Machine Translation

Machine Translation
Do all fragments count?

Natural Language Engineering
Learning non-isomorphic tree mappings for machine translation

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 2
MEMPHIS: a mobile agent-based system for enabling acquisition of multilingual content and providing flexible format internet premium services

Journal of Systems Architecture: the EUROMICRO Journal
Scalable inference and training of context-rich syntactic translation models

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Robust sub-sentential alignment of phrase-structure trees

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Grammar comparison study for translational equivalence modeling and statistical machine translation

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Syntax-driven learning of sub-sentential translation equivalents and translation rules from parsed parallel corpora

SSST '08 Proceedings of the Second Workshop on Syntax and Structure in Statistical Translation
Accuracy-based scoring for DOT: towards direct error minimization for data-oriented translation

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
Quantitative analysis of treebanks using frequent subtree mining methods

TextGraphs-4 Proceedings of the 2009 Workshop on Graph-based Methods for Natural Language Processing
Learning to translate with source and target syntax

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Binarized forest to string translation

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this article, we present a statistical approach to machine translation that is based on Data-Oriented Parsing: Data-Oriented Translation (DOT). In DOT, we use linked subtree pairs for creating a derivation of a source sentence. Each linked subtree pair has a certain probability, and consists of two trees: one in the source language and one in the target language. When a derivation has been formed with these subtree pairs, we can create a translation from this derivation. Since there are typically many different derivations of the same sentence in the source language, there can be as many different translations for it. The probability of a translation can be calculated as the total probability of all the derivations that form this translation. We give the computational aspects for this model, show that we can convert each subtree pair into a productive rewrite rule, and that the most probable translation can be computed by means of Monte Carlo disambiguation. Finally, we discuss some pilot experiments with the Verbmobil corpus.