Text Classification from Labeled and Unlabeled Documents using EM
Machine Learning - Special issue on information retrieval
A systematic comparison of various statistical alignment models
Computational Linguistics
Computational Linguistics - Special issue on web as corpus
The mathematics of statistical machine translation: parameter estimation
Computational Linguistics - Special issue on using large corpora: II
Statistical phrase-based translation
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
DMMT '01 Proceedings of the workshop on Data-driven methods in machine translation - Volume 14
An evaluation exercise for word alignment
HLT-NAACL-PARALLEL '03 Proceedings of the HLT-NAACL 2003 Workshop on Building and using parallel texts: data driven machine translation and beyond - Volume 3
Constrained EM for parallel text alignment
Natural Language Engineering
Paraphrasing with bilingual parallel corpora
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Going beyond AER: an extensive analysis of word alignments and their impact on MT
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Discriminative word alignment with conditional random fields
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Exploiting comparable corpora and bilingual dictionaries for cross-language text categorization
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Semi-supervised training for statistical word alignment
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
A discriminative framework for bilingual word alignment
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Word alignment and cross-lingual resource acquisition
ACLdemo '05 Proceedings of the ACL 2005 on Interactive poster and demonstration sessions
EXTRA: a system for example-based translation assistance
Machine Translation
Statistical machine translation
ACM Computing Surveys (CSUR)
Semi-supervised model adaptation for statistical machine translation
Machine Translation
The web as a platform to build machine translation resources
Proceedings of the 2009 international workshop on Intercultural collaboration
Improving alignment for SMT by reordering and augmenting the training corpus
StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
Active learning-based elicitation for semi-supervised word alignment
ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
Active semi-supervised learning for improving word alignment
ALNLP '10 Proceedings of the NAACL HLT 2010 Workshop on Active Learning for Natural Language Processing
Consensus versus expertise: a case study of word alignment with Mechanical Turk
CSLDAMT '10 Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk
A semi-supervised word alignment algorithm with partial manual alignments
WMT '10 Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
Improving word alignment by semi-supervised ensemble
CoNLL '10 Proceedings of the Fourteenth Conference on Computational Natural Language Learning
EMDC: a semi-supervised approach for word alignment
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Generating phrasal and sentential paraphrases: A survey of data-driven methods
Computational Linguistics
Proceedings of the Third Symposium on Information and Communication Technology
Hi-index | 0.00 |
The parameters of statistical translation models are typically estimated from sentence-aligned parallel corpora. We show that significant improvements in the alignment and translation quality of such models can be achieved by additionally including word-aligned data during training. Incorporating word-level alignments into the parameter estimation of the IBM models reduces alignment error rate and increases the Bleu score when compared to training the same models only on sentence-aligned data. On the Verbmobil data set, we attain a 38% reduction in the alignment error rate and a higher Bleu score with half as many training examples. We discuss how varying the ratio of word-aligned to sentence-aligned data affects the expected performance gain.