A systematic comparison of various statistical alignment models
Computational Linguistics
Head-driven statistical models for natural language parsing
Head-driven statistical models for natural language parsing
The mathematics of statistical machine translation: parameter estimation
Computational Linguistics - Special issue on using large corpora: II
Minimum error rate training in statistical machine translation
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Measuring Word Alignment Quality for Statistical Machine Translation
Computational Linguistics
Moses: open source toolkit for statistical machine translation
ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
Bilingually motivated domain-adapted word segmentation for statistical machine translation
EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Decomposability of translation metrics for improved evaluation and efficient algorithms
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Combination of statistical word alignments based on multiple preprocessing schemes
NAACL-Short '07 Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers
StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Improved statistical machine translation by multiple Chinese word segmentation
StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
Optimizing Chinese word segmentation for machine translation performance
StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
Better word alignments with supervised ITG models
ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
Unsupervised tokenization for machine translation
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Pseudo-word for phrase-based machine translation
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
WMT '10 Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
Joint tokenization and translation
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Word alignment combination over multiple word segmentation
HLT-SS '11 Proceedings of the ACL 2011 Student Session
Hi-index | 0.00 |
The dominant practice of statistical machine translation (SMT) uses the same Chinese word segmentation specification in both alignment and translation rule induction steps in building Chinese-English SMT system, which may suffer from a suboptimal problem that word segmentation better for alignment is not necessarily better for translation. To tackle this, we propose a framework that uses two different segmentation specifications for alignment and translation respectively: we use Chinese character as the basic unit for alignment, and then convert this alignment to conventional word alignment for translation rule induction. Experimentally, our approach outperformed two baselines: fully word-based system (using word for both alignment and translation) and fully character-based system, in terms of alignment quality and translation performance.