Statistical phrase-based translation
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Minimum error rate training in statistical machine translation
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
HHMM-based Chinese lexical analyzer ICTCLAS
SIGHAN '03 Proceedings of the second SIGHAN workshop on Chinese language processing - Volume 17
Adaptive Chinese word segmentation
ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Chinese segmentation and new word detection using conditional random fields
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Measuring Word Alignment Quality for Statistical Machine Translation
Computational Linguistics
Chinese word segmentation and statistical machine translation
ACM Transactions on Speech and Language Processing (TSLP)
Subword-based tagging by conditional random fields for Chinese word segmentation
NAACL-Short '06 Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers
Mixture-model adaptation for SMT
StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Bilingually Motivated Word Segmentation for Statistical Machine Translation
ACM Transactions on Asian Language Information Processing (TALIP)
Bilingually motivated domain-adapted word segmentation for statistical machine translation
EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Unsupervised tokenization for machine translation
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Pseudo-word for phrase-based machine translation
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
WMT '10 Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
Joint tokenization and translation
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Word segmentation for dialect translation
CICLing'11 Proceedings of the 12th international conference on Computational linguistics and intelligent text processing - Volume Part II
Word alignment combination over multiple word segmentation
HLT-SS '11 Proceedings of the ACL 2011 Student Session
Machine translation without words through substring alignment
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Enhancing statistical machine translation with character alignment
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers - Volume 2
Bagging and Boosting statistical machine translation systems
Artificial Intelligence
An empirical study on word segmentation for chinese machine translation
CICLing'13 Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume 2
Substring-based machine translation
Machine Translation
Hi-index | 0.00 |
Chinese word segmentation (CWS) is a necessary step in Chinese-English statistical machine translation (SMT) and its performance has an impact on the results of SMT. However, there are many settings involved in creating a CWS system such as various specifications and CWS methods. This paper investigates the effect of these settings to SMT. We tested dictionary-based and CRF-based approaches and found there was no significant difference between the two in the qualty of the resulting translations. We also found the correlation between the CWS F-score and SMT BLEU score was very weak. This paper also proposes two methods of combining advantages of different specifications: a simple concatenation of training data and a feature interpolation approach in which the same types of features of translation models from various CWS schemes are linearly interpolated. We found these approaches were very effective in improving quality of translations.