Statistical machine translation with word- and sentence-aligned parallel corpora

Authors:
Chris Callison-Burch;David Talbot;Miles Osborne
Affiliations:
University of Edinburgh, Edinburgh;University of Edinburgh, Edinburgh;University of Edinburgh, Edinburgh
Venue:
ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Year:
2004

Citing 7
Cited 21

Text Classification from Labeled and Unlabeled Documents using EM

Machine Learning - Special issue on information retrieval
A systematic comparison of various statistical alignment models

Computational Linguistics
The Web as a parallel corpus

Computational Linguistics - Special issue on web as corpus
The mathematics of statistical machine translation: parameter estimation

Computational Linguistics - Special issue on using large corpora: II
Statistical phrase-based translation

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Building a statistical machine translation system from scratch: how much bang for the buck can we expect?

DMMT '01 Proceedings of the workshop on Data-driven methods in machine translation - Volume 14
An evaluation exercise for word alignment

HLT-NAACL-PARALLEL '03 Proceedings of the HLT-NAACL 2003 Workshop on Building and using parallel texts: data driven machine translation and beyond - Volume 3

Constrained EM for parallel text alignment

Natural Language Engineering
Paraphrasing with bilingual parallel corpora

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Going beyond AER: an extensive analysis of word alignments and their impact on MT

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Discriminative word alignment with conditional random fields

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Exploiting comparable corpora and bilingual dictionaries for cross-language text categorization

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Semi-supervised training for statistical word alignment

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
A discriminative framework for bilingual word alignment

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Word alignment and cross-lingual resource acquisition

ACLdemo '05 Proceedings of the ACL 2005 on Interactive poster and demonstration sessions
EXTRA: a system for example-based translation assistance

Machine Translation
Statistical machine translation

ACM Computing Surveys (CSUR)
Semi-supervised model adaptation for statistical machine translation

Machine Translation
The web as a platform to build machine translation resources

Proceedings of the 2009 international workshop on Intercultural collaboration
Improving alignment for SMT by reordering and augmenting the training corpus

StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
Active learning-based elicitation for semi-supervised word alignment

ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
Active semi-supervised learning for improving word alignment

ALNLP '10 Proceedings of the NAACL HLT 2010 Workshop on Active Learning for Natural Language Processing
Consensus versus expertise: a case study of word alignment with Mechanical Turk

CSLDAMT '10 Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk
A semi-supervised word alignment algorithm with partial manual alignments

WMT '10 Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
Improving word alignment by semi-supervised ensemble

CoNLL '10 Proceedings of the Fourteenth Conference on Computational Natural Language Learning
EMDC: a semi-supervised approach for word alignment

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Generating phrasal and sentential paraphrases: A survey of data-driven methods

Computational Linguistics
Refining lexical translation training scheme for improving the quality of statistical phrase-based translation

Proceedings of the Third Symposium on Information and Communication Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

The parameters of statistical translation models are typically estimated from sentence-aligned parallel corpora. We show that significant improvements in the alignment and translation quality of such models can be achieved by additionally including word-aligned data during training. Incorporating word-level alignments into the parameter estimation of the IBM models reduces alignment error rate and increases the Bleu score when compared to training the same models only on sentence-aligned data. On the Verbmobil data set, we attain a 38% reduction in the alignment error rate and a higher Bleu score with half as many training examples. We discuss how varying the ratio of word-aligned to sentence-aligned data affects the expected performance gain.