Enhancing statistical machine translation with character alignment

Authors:
Ning Xi;Guangchao Tang;Xinyu Dai;Shujian Huang;Jiajun Chen
Affiliations:
Nanjing University, Nanjing, China;Nanjing University, Nanjing, China;Nanjing University, Nanjing, China;Nanjing University, Nanjing, China;Nanjing University, Nanjing, China
Venue:
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers - Volume 2
Year:
2012

Citing 18
Cited 0

A systematic comparison of various statistical alignment models

Computational Linguistics
Head-driven statistical models for natural language parsing

Head-driven statistical models for natural language parsing
The mathematics of statistical machine translation: parameter estimation

Computational Linguistics - Special issue on using large corpora: II
Minimum error rate training in statistical machine translation

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Measuring Word Alignment Quality for Statistical Machine Translation

Computational Linguistics
Moses: open source toolkit for statistical machine translation

ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
Bilingually motivated domain-adapted word segmentation for statistical machine translation

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Decomposability of translation metrics for improved evaluation and efficient algorithms

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Combination of statistical word alignments based on multiple preprocessing schemes

NAACL-Short '07 Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers
Can we translate letters?

StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Improved statistical machine translation by multiple Chinese word segmentation

StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
Optimizing Chinese word segmentation for machine translation performance

StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
Better word alignments with supervised ITG models

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
Unsupervised tokenization for machine translation

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Pseudo-word for phrase-based machine translation

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Integration of multiple bilingually-learned segmentation schemes into statistical machine translation

WMT '10 Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
Joint tokenization and translation

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Word alignment combination over multiple word segmentation

HLT-SS '11 Proceedings of the ACL 2011 Student Session

Quantified Score

Hi-index	0.00

Visualization

Abstract

The dominant practice of statistical machine translation (SMT) uses the same Chinese word segmentation specification in both alignment and translation rule induction steps in building Chinese-English SMT system, which may suffer from a suboptimal problem that word segmentation better for alignment is not necessarily better for translation. To tackle this, we propose a framework that uses two different segmentation specifications for alignment and translation respectively: we use Chinese character as the basic unit for alignment, and then convert this alignment to conventional word alignment for translation rule induction. Experimentally, our approach outperformed two baselines: fully word-based system (using word for both alignment and translation) and fully character-based system, in terms of alignment quality and translation performance.