DUSTer: A Method for Unraveling Cross-Language Divergences for Statistical Word-Level Alignment

Authors:
Bonnie J. Dorr;Lisa Pearl;Rebecca Hwa;Nizar Habash
Affiliations:
-;-;-;-
Venue:
AMTA '02 Proceedings of the 5th Conference of the Association for Machine Translation in the Americas on Machine Translation: From Research to Real Users
Year:
2002

Citing 17
Cited 9

A statistical approach to machine translation

Computational Linguistics
Empirical Methods for MT Lexicon Development

AMTA '98 Proceedings of the Third Conference of the Association for Machine Translation in the Americas on Machine Translation and the Information Soup
Handling Stuctural Divergences and Recovering Dropped Arguments in a Korean/English Machine Translation System

AMTA '00 Proceedings of the 4th Conference of the Association for Machine Translation in the Americas on Envisioning Machine Translation in the Information Future
Learning dependency translation models as collections of finite-state head transducers

Computational Linguistics - Special issue on finite-state methods in NLP
The mathematics of statistical machine translation: parameter estimation

Computational Linguistics - Special issue on using large corpora: II
Stochastic inversion transduction grammars and bilingual parsing of parallel corpora

Computational Linguistics
A framework for MT and multilingual NLG systems based on uniform lexico-structural processing

ANLC '00 Proceedings of the sixth conference on Applied natural language processing
Learning parse and translation decisions from examples with rich context

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Chart-based transfer rule application in Machine Translation

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Finding structural correspondences from bilingual parsed corpus for corpus-based translation

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 2
A syntax-based statistical translation model

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Evaluating translational correspondence using annotation projection

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Inducing multilingual POS taggers and NP bracketers via robust projection across aligned corpora

NAACL '01 Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies
Improved statistical alignment models

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Sample selection for statistical grammar induction

EMNLP '00 Proceedings of the 2000 Joint SIGDAT conference on Empirical methods in natural language processing and very large corpora: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 13
Inducing lexico-structural transfer rules from parsed Bi-texts

DMMT '01 Proceedings of the workshop on Data-driven methods in machine translation - Volume 14
A best-first alignment algorithm for automatic extraction of transfer mappings from bilingual corpora

DMMT '01 Proceedings of the workshop on Data-driven methods in machine translation - Volume 14

Handling Translation Divergences: Combining Statistical and Symbolic Techniques in Generation-Heavy Machine Translation

AMTA '02 Proceedings of the 5th Conference of the Association for Machine Translation in the Americas on Machine Translation: From Research to Real Users
Construction of a Chinese–English Verb Lexicon for Machine Translation and Embedded Multilingual Applications

Machine Translation
A categorial variation database for English

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Word alignment with cohesion constraint

NAACL-Short '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: companion volume of the Proceedings of HLT-NAACL 2003--short papers - Volume 2
Improving bitext word alignments via syntax-based reordering of English

ACLdemo '04 Proceedings of the ACL 2004 on Interactive poster and demonstration sessions
NeurAlign: combining word alignments using neural networks

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Alignment link projection using transformation-based learning

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Dependency-Based Chinese-English Statistical Machine Translation

CICLing '07 Proceedings of the 8th International Conference on Computational Linguistics and Intelligent Text Processing
Symbolic-to-statistical hybridization: extending generation-heavy machine translation

Machine Translation

Quantified Score

Hi-index	0.00

Visualization

Abstract

The frequent occurrence of divergences--structural differences between languages--presents a great challenge for statistical word-level alignment. In this paper, we introduce DUSTer, a method for systematically identifying common divergence types and transforming an English sentence structure to bear a closer resemblance to that of another language. Our ultimate goal is to enable more accurate alignment and projection of dependency trees in another language without requiring any training on dependency-tree data in that language. We present an empirical analysis comparing the complexities of performing word-level alignments with and without divergence handling. Our results suggest that our approach facilitates word-level alignment, particularly for sentence pairs containing divergences.