Constrained EM for parallel text alignment

Authors:
David Talbot
Affiliations:
School of Informatics, University of Edinburgh, 2 Buccleuch Place, Edinburgh EH8 9LW, UK e-mail: d.r.talbot@sms.ed.ac.uk
Venue:
Natural Language Engineering
Year:
2005

Citing 17
Cited 0

Text Classification from Labeled and Unlabeled Documents using EM

Machine Learning - Special issue on information retrieval
A systematic comparison of various statistical alignment models

Computational Linguistics
The mathematics of statistical machine translation: parameter estimation

Computational Linguistics - Special issue on using large corpora: II
A program for aligning sentences in bilingual corpora

ACL '91 Proceedings of the 29th annual meeting on Association for Computational Linguistics
Inside-outside reestimation from partially bracketed corpora

ACL '92 Proceedings of the 30th annual meeting on Association for Computational Linguistics
A statistical approach to language translation

COLING '88 Proceedings of the 12th conference on Computational linguistics - Volume 1
HMM-based word alignment in statistical translation

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2
Statistical phrase-based translation

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Cognates can improve statistical translation models

NAACL-Short '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: companion volume of the Proceedings of HLT-NAACL 2003--short papers - Volume 2
Improved statistical alignment models

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
But dictionaries are data too

HLT '93 Proceedings of the workshop on Human Language Technology
Bootstrapping parsers via syntactic projection across parallel texts

Natural Language Engineering
Extensions to HMM-based statistical word alignment models

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
A phrase-based, joint probability model for statistical machine translation

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Phrasal cohesion and statistical machine translation

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Statistical translation alignment with compositionality constraints

HLT-NAACL-PARALLEL '03 Proceedings of the HLT-NAACL 2003 Workshop on Building and using parallel texts: data driven machine translation and beyond - Volume 3
Statistical machine translation with word- and sentence-aligned parallel corpora

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Standard parameter estimation schemes for statistical translation models can struggle to find reasonable settings on some parallel corpora. We show how auxiliary information can be used to constrain the procedure directly by restricting the set of alignments explored during parameter estimation. This enables the integration of bilingual and monolingual knowledge sources while retaining the flexibility of the underlying models. We demonstrate the effectiveness of this approach for incorporating linguistic and domain-specific constraints on various parallel corpora, and consider the importance of using the context of the parallel text to guide the application of such constraints.